All-hazards dataset mined from the US National Incident Management System 1999–2020

This paper describes a dataset mined from the public archive (1999–2020) of the US National Incident Management System Incident Status Summary (ICS-209) forms (a total of 187,160 reports for 35,170 incidents, including 34,478 wildland fires). This system captures detailed daily/regular information on incident development and response, including social and economic impacts. Most (98.4%) reports are wildland fire-related, with other incident types including hurricane, hazardous materials, flood, tornado, search and rescue, civil unrest, and winter storms. The archive, although publicly available, has been difficult to use for research due to multiple record formats, inconsistent data entry, and no clean pathway from individual reports to high-level incident analysis. Here, we describe the open-source, reproducible methods used to produce a science-grade version of the data, including formal connections made to other published wildland fire data products. Among other applications, this integrated and spatially augmented dataset enables exploration of the daily progression of the most costly, damaging, and deadly environmental-hazard events in recent US history.


Background & Summary
There has been a steady rise in the occurrence of billion-dollar environmental disasters in the United States (US) since the 1980s, with the past five years (2017-2021) setting historic highs. The average number of billion-dollar disasters across this period has more than doubled (from 7.1 to 17.8 events per year) 1 . Further, cumulative costs in 2017 set a new annual record of $306.2 billion 2 . There is evidence that the frequency and magnitude of environmental hazards is changing, with some extreme weather events linked to anthropogenic climate change 3 , such as the intensity of tropical storms 4 . Within just the past few decades, the area burned by wildfires in the western US has increased at least threefold [5][6][7][8] , with a strong climate change influence in forest systems 9 . This is a critical moment to develop new methods and data products to help understand the interrelationships between the physical and environmental characteristics of environmental hazards, incident response and management actions, and the societal impacts of large-scale or otherwise significant events 10-12 . Wildfires and hurricanes are two environmental hazards that have significant societal impacts and require costly and complex incident response. In the last five years, damages from wildfires alone have exceeded $81 billion 1 , destroyed over 60k structures 13 , resulted in 233 deaths 1 , and current suppression costs average $2-3 billion each year 14 . Thirteen of the 20 most destructive California wildfires occurred in the past five years (2017-2021), killing 148 people and destroying 40,235 structures 15 . During that same five-year period, there were 18 separate billion-dollar hurricanes that made landfall in the US with an inflation-adjusted total loss of $496.2 billion and 3,474 fatalities 1 . While not all become disasters with great societal impacts, all potentially significant hazard events require a government-coordinated response with critical documentation about how each event is unfolding and its threats to life, property, and other valued resources and assets. In this paper we use the term wildfire to mean fires in wildland fuels caused by unplanned ignitions (natural or human-caused). The broader term wildland fire includes fires from planned ignitions (i.e., controlled or prescribed burns). We use the term hazard History of ICS. The ICS-209 form, commonly referred to as a sitrep, is part of the US National Incident Management System/Incident Command System (NIMS/ICS). The earliest implementation of ICS was developed by the US Forest Service (USFS) following a devastating fire in California in 1970 that claimed 16 lives, destroyed over 700 structures and burned over 500k acres. Numerous communication and coordination issues hampered the effectiveness of the agencies involved, resulting in a congressional mandate requiring the USFS to design a new system to facilitate interagency coordination and to support the allocation of suppression resources in dynamic, multi-fire situations 17,32,33 . The USFS worked in collaboration with California state agencies to produce FIRESCOPE (FIrefighting RESources of California Organized for Potential Emergencies) with two key components: the Incident Command System (ICS) and the Multi-Agency Coordination System (MACS) 32 . By 1981, FIRESCOPE was used by agencies throughout Southern California and was adapted for non-fire use. In parallel with this effort, the NWCG adopted and revised the FIRESCOPE ICS documentation to create the National Interagency Incident Management System (NIIMS) Incident Command System Operational System Description (ICS 120-1) -a document that was collectively maintained by CalFire and NWCG. This document later served as the basis for NIMS ICS 33 . Following the 2001 September 11th terrorist attacks in the US, the Department of Homeland Security (DHS) was formed, and on February 28, 2003, President George W. Bush issued Presidential Directive-5 34 calling for the establishment of a single, comprehensive national incident management system, which became NIMS.

NIMS ICS. The NIMS was issued in March 2004 to enable responders at all jurisdictional levels and disciplines
to work together more effectively by establishing a single, comprehensive national incident management structure 17,33 . In 2005, there was a push to institutionalize the use of ICS across the entire response system and by 2006, federal funding for state, local and tribal grants was tied directly to compliance with the NIMS 33 . The NIMS/ICS is built upon existing incident management best practices including ICS and MACS. It fully delineates standardized command and control structures and procedures designed to support interoperability among jurisdictions and across disciplines as the complexity of a response effort increases. The planning function is centralized within ICS with information captured during each operational period flowing up to the tactical and strategic planning level 17 .
The ICS 209 incident status summary. The Federal Emergency Management Agency (FEMA) describes the purpose of the ICS-209 as follows: "The ICS 209 is used for reporting information on significant incidents … The ICS 209 contains basic information elements to support decision making at all levels above the incident to support the incident. Decision makers may include the agency having jurisdiction, but also all multiagency coordination system (MACS) elements and parties, such as cooperating and assisting agencies/organizations, dispatch centers, emergency operation centers, administrators, elected officials, and local, tribal, county, State, and Federal agencies. " 16 . The ICS-209 is described as providing a "snapshot in time, " capturing the most accurate and up-to-date information available at the time of preparation. The form is typically completed by the Situation Unit Leader or Section Planning Chief within the Incident Management Team but may also be completed by a local dispatcher or another staff member when necessary. Reports are logged for each operational period or when information becomes outdated in a quickly evolving incident. Each report describes current characteristics of the hazard, current environmental conditions, current and projected incident management costs, details about specific resources assigned to the incident, critical resource needs, a description of structural and life safety threats, an ongoing accounting of injuries, fatalities, damages, and the projected incident management outlook.
The format and content of the ICS-209 has evolved over time in parallel with efforts to adapt the form for all-hazards use. Our inspection of records identified new fields added in 2004, when the system was incorporated into NIMS, and again in 2007 to support all-hazards reporting. It is important to note that the use of NIMS/ICS was not mandatory on large incidents until fiscal year 2006 33 , and so there may be significant gaps in reporting prior to this date. Any time-series analysis exploring trends in the data must acknowledge this limitation. Additionally, the ICS-209-PLUS dataset is based on the published data and may exclude records containing sensitive information.
The raw data are published in three separate formats. The original format, Historical System 1 (HIST1), spans 1999 to 2002 and includes basic incident information, start location, personnel usage, and total structures damaged/destroyed. The second format, Historical System 2 (HIST 2), was introduced in 2001 and captures a broader set of information related to the hazard including incident complexity, fire behaviour, fuels, and local weather. It contains freeform narrative text fields to capture projected risk to communities, resources at risk, critical resource needs, planned actions, and projected hazard movement/spread. Additional societal impact values include injuries, fatalities, evacuations in progress, and estimates of structures threatened. The current system format was released in 2014. Tighter standardisation of values on the form resulted in cleaner categorical data. Major changes include an expansion of formats for capturing point-of-origin data, expanded functionality for tracking casualties and illnesses, and expanded functionality for tracking life safety management. Table 2 provides a high-level summary of data elements in the ICS-209-PLUS dataset. Refer to 35 for a description  33 . The complex record clusters all fires and sitreps associated with a fire complex under the same incident number, capturing individual fire names, suppression strategies, current containment percentages and estimated costs to date. The current version also includes the current area for individual fires within the complex. This information is missing in both historical versions of the system. We used data that was manually compiled and verified to derive the Wildfire Complex Associations table for wildfires between 1999 and 2010 and data derived initially from the complex associations table for 2010+ to describe the relationship between fire complexes and individual fires. This process is detailed in the methods below. Finally, the current version has an incidents table that contains basic incident level information including discovery date, cause, area, location, and estimated cost to date. Concatenated versions of the original complex and incident tables are included in the dataset (Table 1), but we did not clean or modify these tables. We created a new Wildfire Incident Summary open/reproducible framework. We produced the ICS-209-PLUS dataset using principles of open and reproducible science [36][37][38] . All data source files, and the final ICS-209-PLUS dataset are archived online 35 . The python source code for the ICS-209-PLUS creation 39 , R source code to link to the Fire Events Delineation (FIRED) dataset 40 and the spatiotemporal linkage and all figures and tables 41 , are publicly available. Our aims are twofold: to provide transparency to the methods and assumptions used to produce the final dataset and to provide a framework for others to adapt or expand upon the dataset. The code is written in Python using the Numpy and Pandas data science libraries. We were unable to automate the downloading of the raw data from FAM archive and so our code assumes all relevant tables (Table 1) are downloaded to the corresponding annual directories beforehand.
We accomplished several key objectives in this updated version. First, we aligned data elements and standardised values across both historical versions with the current data model, extending the record through 2020. This allows for seamless comparison of records across the entire time period. Secondly, due to the free-form nature of the fields and limited mechanisms enforcing data entry standards, the original data is notoriously messy and difficult to use. The scripts are designed to automate as much of the cleaning and formatting as possible,  www.nature.com/scientificdata www.nature.com/scientificdata/ improving the overall consistency of the dataset. It is also designed to support the manual updates identified in the process of producing the dataset. This is important for several reasons. It allowed us to easily incorporate updates for fields such as Latitude and Longitude that were deemed critical for dataset use. In the initial release it provided a framework for incorporating the cleaning efforts and refinements curated by co-author Karen Short and for repairing issues identified in the Current version found in years 2015+ (see Repairing Records 2015+). New fields have been added based on values available in the current reporting system (see Extending the Dataset Beyond 2014). We connect the ICS-209-PLUS dataset with the Fire Program Analysis Fire-Occurrence Database (FPA FOD 42,43 ) enabling linkage with Monitoring Trends in Burn Severity (MTBS 44,45 ) fire products and additionally connect to the novel spatial fire database, Fire Events Delineation (FIRED 46,47 ) (see Linking the Wildfire Incident Summary Record to FIRED). The updated dataset is then used to create a spatiotemporal linkage, which assigns wildfire incidents to corresponding county, tract, and block group units as well as quarters and years (see Assigning Wildfire Incident IDs to Spatial and Temporal Units). The following sections detail how the dataset is produced from the merging of the original source files to the creation of the Wildfire Incident Summary table, and external linkages.
producing the ICS-209-pLUS dataset. The ICS-209-PLUS dataset is produced by a series of Python scripts that first consolidates the annual files for each of the tables across the three versions of the system (Table 1). Each version of the sitrep table is then cleaned and prepared for the merge. This includes general cleaning and formatting for each field, field-level updates to correct known errors, and deletion of duplicates/erroneous records. Each of the related tables are pivoted and totals are calculated for personnel, aerial equipment, structures threatened, structures damaged, structures destroyed, injuries, fatalities, evacuations (2014+), and wildfire suppression strategy (2014+) for each situation report. These totals are joined into the situation report and then columns across the three versions are aligned and appended together. Once the data has been consolidated into a single dataset, individual fields are cleaned and smoothed, filling missing values and adjusting values where appropriate. This finalised version is then used to produce an all-hazards dataset (ICS-209-PLUS All-Hazards), and a wildfire dataset (ICS-209-PLUS WF). The wildfire dataset is composed of two tables: all the wildfire daily status summaries and an incident level summary record. The Wildfire Incident Summary contains high-level statistics that are useful from a research standpoint. An overview of the process followed to create both the all-hazards and wildfires datasets is summarized in the flowchart below (see Fig. 1). The python code containing the logic for each step is included in parenthesis.
Cleaning and formatting the individual datasets. The scripts clean each version of the sitrep table prior to the merge. This is necessary to deal with subtle differences between each version. Unique identifiers are constructed within the historical datasets to separate out individual fire events and to group related incidents together. We clean and standardise values for each historical version so that they merge smoothly into the final dataset. Once this preliminary cleaning is complete, members of the historical dataset are compared with a refined version of the record and sitreps that are not members of this refined set are archived to a deleted sitreps table (described later).
Creating unique incident and fire identifiers. The Incident Number field is meant to uniquely identify an incident, but there are multiple issues with this field, particularly in the historical datasets. In some instances, incident numbers are incomplete, or they are re-used from year to year, resulting in sitreps for multiple incidents being grouped together as a single incident. Splitting them based on year is problematic because some fires, particularly in the southeastern United States span the annual boundary or have a final report filed in the next year. There are also instances where Incident Name and point of origin are distinctly different but share the same incident number. Conversely, there are incidents in the current version that have the same incident number, but are split across multiple unique system identifiers. Finally, there are instances where fires are incorporated into a fire complex and the Incident Number changes to that of the fire complex. We addressed these issues by creating two concatenated ID fields: the Fire Event ID and the Incident ID. The Fire Event ID is used to identify individual wildfires regardless of whether they are managed as part of a larger fire complex. The Incident Id is used to group all sitreps related to an incident response, clustering related situation reports that are related but may differ in terms of the Incident Number and or the Incident Name.
The fire event ID. The Fire Event ID is a concatenation of the Start Year and the Incident Number fields followed by a sequence number (default = 1). The Start Year separates instances where the Incident Number is re-used from year-to-year. We manually scanned sitreps in both historical versions sorted by Incident Number, Incident Name, Discovery Date, and the report date to identify records that needed to be split. For example, Incident Number "AR-ARS-D2" was assigned to three separate incidents starting in different locations at different times (Table 3). We split them by adjusting the sequential variable for Dierks to 2 and Red Barn to 3. the incident ID. The Incident ID is a concatenation of the Start Year, the final Incident Number, and the final Incident Name, such that multiple fires can be grouped together if they are later incorporated into a larger response. This information is missing in the historical datasets, but was manually compiled and verified by co-author Karen Short over time during the compilation of the Fire Program Analysis Fire-Occurrence Database (FPA-FOD 42,43 ). That work included consulting several sources to piece together relationships between individual fires and fire complexes and to purge duplicate and erroneous situation reports across the two historical records. We use this cleaned version of the ICS-209 records, referred to as the Short master list 35 , as a definitive reference www.nature.com/scientificdata www.nature.com/scientificdata/ such that this table is used to create the Incident ID for all historical sitreps and determines which records should be deleted (see Purging Duplicate and Erroneous Records).
The example below is taken from the 2006 Boundary Complex. It illustrates how Incident ID is used to group related sitreps together (Table 4) while preserving the original identifiers. The complex includes the following individual fires: Boundary, Elkhorn 2, Lost Lake, Deer, Thicket, Chuck, East Elk, North Elk, and Knapp 2, all under Incident ID 2006_ID-SCF-006336_BOUNDARY COMPLEX, which also has its own sitreps. The Fire Event IDs are included at the far right to illustrate how the Incident ID allows for multiple physical fires to be grouped together as a single response, whereas the Fire Event ID provides a unique identifier for each physical fire event or management grouping.
General field level cleaning. We used the Python data science tools to inspect values contained in each column across the three versions to determine what actions were needed to clean and prepare for the merge. Many columns had standardised values, but contained extraneous characters or inconsistencies. The script uses regular expressions to standardise values for fields like GACC Priority, Dispatch Priority, Percent Containment, Containment Date, and Incident Management Team Type fields. Once these values are standardised, they are linked to corresponding values in the lookup code tables. The script also removes all linefeeds and hidden characters from text fields to make viewing and processing the fields easier. Values such as "N/A", "same", or "none" and redundant values are deleted from the consolidated text fields. The script fixes any obvious date errors (e.g., year values of 1901 instead of 2001) and applies consistent formatting across all date fields. All Latitude and Longitude values have been converted to decimal degrees. We cleaned and formatted most of the fields except weather variables and fuels. We determined that both these fields would require extensive effort and fell outside the scope of the current release.  www.nature.com/scientificdata www.nature.com/scientificdata/ Throughout the process, we identified individual values that were clearly an error and made some individual field level updates. These updates are limited and are incorporated into the general field cleaning function for each script. In the future, there is potential to maintain these updates as part of a field level update table that could be loaded at runtime to automate individual field-level modifications. This would be an ideal solution to support ongoing update and maintenance of the dataset in the future but is beyond the scope of the current release.
Transforming standardised fields. Standard values remained relatively consistent across the three versions, with new values added as the form was adapted for all-hazards use. The Cause and Suppression Method Abbreviation fields changed slightly from the historical to the new version and so we translated old values to equivalent new values (Table 5). A handful of Incident Types were eliminated in the current system. After careful consideration, we decided to keep the historical values for consistency and to prevent information loss. Prescribed burns (RX) and Wildfire Used for Resource Benefit (WFU) have been included in the Wildfire datasets. The ICS-209-PLUS Form is not intended for tracking planned ignitions and so the RX incident type is rare (0.4% of incidents), but the form was sometimes used to request resources during periods of resource scarcity. There are only 144 incidents (471 sitreps) with Incident Type RX from 1999 to 2013 with most occurring in 2005 (31 incidents). The WFU Incident Type was obsoleted in 2009. There are 772 Incident Summary records and 6120 sitreps for that type prior to 2009. We reclassified all values that were binary (yes/no) to boolean values (true/false) to make them consistent and to put them in a more standard database format.
Cleaning and consolidating narrative text. Each version of the Incident Status Summary provides space for recording important observations from incident command. The earliest version of the report (Historical System 1) has only one Narrative field whereas later versions have multiple narrative text fields organized around the following topics: critical resource needs, current threats, projected incident movement and spread, weather, fuels, relevant conditions, and general remarks. Critical resource needs, current threats, and projected fire activity capture projected values at 12, 24, 48, 72, and greater than 72 hours from the current report. We consolidated these observations into one narrative field for each topic to manage the complexity of the dataset, eliminate redundancy, and to organize the observations for potential text mining and topic modeling efforts. Before consolidating, we clean each individual field to strip hidden characters, eliminate placeholder values (e.g. "n/a", "same", "none") and eliminate duplicate values. A pipe '|' character is used to separate observations. For example, the following entries in the projected activity fields: Projected Movement 12: "Minimal fire movement due to lower temps higher RH and precipitation. " Projected Movement 24: "Minimal fire movement due to lower temps higher RH and precipitation. " Projected Movement 48: "Moderate fire activity is anticipated on Friday due to warming temps, falling RH, and wind. " Projected Movement 72: "same" are consolidated into a single Projected Activity Narrative: "Minimal fire movement due to lower temps higher RH and precipitation|Moderate fire activity is anticipated on Friday due to warming temps, falling RH, and wind. " Table 6 summarizes the narrative text fields in the final dataset. The boldfaced fields are the newly consolidated fields that condense projected values into a single narrative summary. The version column identifies which versions populate this field.  ID-SCF-006336  Boundary Complex  39  8/8  2006|ID-SCF-006336|1   ID-SCF-6245  Elkhorn2  1  8/9  2006|ID-SCF-6245|1   ID-SCF-6349  Lost Lake  2  8/8  2006|ID-SCF-6349|1   ID-SCF-6369  Deer  2  8/31  2006|ID-SCF-6369|1   ID-SCF-6373  Thicket  1  8/7  2006|ID-SCF-6373|1 ID-SCF-6415 www.nature.com/scientificdata www.nature.com/scientificdata/ Method also allows for multiple suppression methods and percentage reporting across corresponding strategies. Point of origin coordinates. Given the critical role point of origin data plays in geospatial analysis, we manually cleaned and inspected the point of origin coordinates, fixing obvious errors and providing estimates for missing or obviously erroneous values. The values in the earliest system (Historical System 1) were first converted from degrees and minutes to decimal format. We then mapped all the points, identifying those that fell outside of the United States and its territories. The most common issue was an incorrect numeric sign for latitude or longitude. The longitude was incorrect for 98.5% of the reported locations in the second historical system (Historical System 2). Nearly all values in the Current System (99.72%) fell within the United States and its territories. Wherever possible, we used latitude/longitude values from the FPA FOD for missing and erroneous values. We then manually examined the remaining values that fell outside of the clipped boundaries individually. We used the information contained in other point of origin fields (e.g., the location description) to estimate latitude and longitude. For each of these estimated values, we set the Lat/Long Update flag to true and set the Lat/Long Confidence field to capture our level of confidence in this estimate (low, medium, high). We rated our estimate as high to medium if we were able to get close to the actual point of origin (e.g., intersection of roads) and low if the location description was vague (e.g., 6 miles southwest of Sisters, Oregon). Our goal was to maximize available geospatial information while allowing users of the data to filter out low-confidence or updated values when a high level of accuracy is needed. The accuracy and completeness of the data improves over time across the three versions, as well as the location description fields available for estimation. In the earliest version (Historical System 1), 29% of coordinates were missing or erroneous but we were able to populate nearly half (49%) of the missing    www.nature.com/scientificdata www.nature.com/scientificdata/ values with estimates taken from the corresponding record in the FPA-FOD database. With limited information, we were only able to derive a point of origin for 103 additional values (12%) with the majority of those (89 of the incidents) estimated as low confidence due to limited location information.
In contrast, only 2% of the coordinates were missing or erroneous in the second historical version (Historical System 2) and we were able to populate 45% of the missing values with estimates taken from the corresponding record in the FPA FOD. We were able to estimate an additional 26% of missing values with a mix of confidence levels (42 incidents with high confidence, 26 with medium confidence, and 50 with low confidence levels). Finally, only 1.6% of the coordinates were missing or erroneous in the 2014+ data with over half of the missing values populated with values taken from the corresponding record in the FPA FOD, and we were able to estimate all but 2 of the remaining values with a high level of confidence with roughly half requiring a simple swap of latitude and longitude values to correct. Table 8 below summarizes latitude and longitude updates by system and corresponding levels of confidence.
Some of these records may get deleted as part of the merging and cleaning process described below resulting in 98% of incidents with a valid point of origin in the final dataset.
Preparing to merge. The individual fields and values in the incident status reports remained relatively consistent across the three versions, but the underlying data model continued to evolve to adapt to all-hazards management and to capture more detailed information about resources, life safety threats, and management actions. Our goal when mapping values across the three versions was to maximize continuity while making the historical Field Name(s) Description Data Type  www.nature.com/scientificdata www.nature.com/scientificdata/ data forward compatible with the current system. Most of the data elements aligned with minimal or no modification. There were several data elements that had no equivalent in the current system. We preserved the ones that had a high fill rate: Major Problems, Observed Fire Behavior, and Terrain. A list of the resulting columns, including their fill rates, is provided in the ICS-209-PLUS Sitrep Table Definition file 35 .
In addition to the consolidated text fields described in the section above, we added fields to the situation report to align the historical data with the current version or where they added value. For example, the Acres field provides a convenient way to compare incident area without having to convert units of measurement. The day of year (DOY) fields (Discovery DOY and Report DOY), Start Year, and Current Year (CY) support simple querying and analysis without having to manipulate the related timestamps. The script also integrates totals calculated from the related tables into the incident status record. Purging duplicate and erroneous records. As mentioned above, the historical datasets overlap between 2001 and 2003, and sometimes incident status reports were logged in both systems resulting in duplicate records across the two systems, along with other erroneous records. Many of these records were deleted from the dataset originally maintained by Karen Short. Rather than deleting each of these records explicitly, we use the records in the Short dataset as a master list.
Any wildfire that does not exist in the master list is removed from the production dataset. Once the cleaning and formatting of the sitrep table is complete, the wildfires in the master list are moved to the production dataset   www.nature.com/scientificdata www.nature.com/scientificdata/ and the deleted records are archived to a separate deletions file for reference (Table 16). The comparison resulted in the deletion of 527 sitreps from the first historical dataset (3.7%, 57% of these overlapping with Historical System 2) and 3,597 sitreps from the second historical dataset (3.4%).
We found a duplication issue in the Current System when reporting extended into the following year. For some incidents, some or all of the associated sitreps were being duplicated and assigned a new unique system identifier. We manually inspected the records to determine if it made sense to delete one of the incidents or merge the two into a single incident. This resulted in the deletion of 108 incidents (527 sitreps) and the merging of 49 overlapping incidents. The deleted records are archived in a deletions file (see output files in Table 16). Details about duplicate incident identifiers can be found in the Incident Cleaning File (ics-inc-cleanup{yyyy-toyyyy}.csv) included with the source files. record repair methods 2014+. There were issues with the dataset between 2015-2017 that were unresolvable at the time of our prior release. Structure and resource data was missing in 2015 and 2016 because it had been overwritten with data from 2014. In some cases, the sitreps in 2015 and 2016 are also missing important values within the reports such as INCIDENT_NAME or INCIDENT_NUMBER. There is significant data loss across all reports within these years. Additionally, the point of origin coordinates in 2017 were corrupted with random values across individual incidents.
We worked with contacts at the National Interagency Coordination Center to update the overwritten files. We patched the sitreps using values from the SIT209_HISTORY_INCIDENTS table. This allowed us to fix the static information such as incident name, incident number, discovery date, cause, and all the location description fields including the point of origin coordinates. The situation reports in 2015 and 2016 are missing narrative text and other content that we were unable to repair, but the fill rates for these fields are roughly equivalent with other years. We use these coordinates to fix the inconsistencies within reports in 2017.
Merging and final cleaning. Once each of the individual datasets is refined, historical data elements are renamed to align with the corresponding columns in the current version (  www.nature.com/scientificdata www.nature.com/scientificdata/ The script then makes a final cleaning and smoothing pass across the records, filling missing values where appropriate and smoothing data elements to make them more consistent. The specifics are described below. Filling missing values. Several fields in the dataset are either cumulative or the value, once known, is unlikely to change. We forward filled these fields with the previous known value to minimize gaps and to make sure that these values were propagated to the final report. This was important not just for consistency, but also because these records are used to produce the Wildfire Incident Summary Smoothing acres and calculating new acres. Once the forward-filling of acres is complete, we perform a backwards smoothing pass. If an estimate of Acres is reduced on a subsequent report, we reduce the number of acres on previous reports given that a fire never truly gets smaller; this is to address overestimations at the time of reporting. After making the necessary adjustments, we use the values to populate the New Acres field, which is then used to calculate the daily fire spread rate (Wildfire FSR) (see Wildfire Incident Summary section below). This field was also particularly prone to data entry error and variations in notation, particularly for estimates in the millions or billions of dollars. When the records were sorted by incident and report date, it was easy to identify instances where someone either left off a digit or added too many for that particular day. Also, as cost increased, sometimes notation changed to simplify data entry (e.g., 1,200,000 becomes 1.2 for $1.2 million dollars). After forward filling the values, we started the cleaning process by manually inspecting all instances where the final reported values were an order of magnitude smaller than the maximum value entered across the reports. We designated the final value based on comparison of trends across the existing reports. We were conservative, only correcting obvious errors. These updates are individually updated in the cost adjustments function of the merge script. Once corrections were made to the final reported values, we performed two smoothing passes. We first worked backward from the final cost, adjusting any estimates that were more than 10x larger than the current value by reducing it until it was within the 10x limit. We then worked forward, adjusting any values that were at least 9x smaller than the previous estimate until they fell within the 9x limit. When both these passes were complete, if there was no value for the Projected Final Incident Management Cost, we defaulted it to the Estimated Incident Management Costs to Date on the final report.

Creating the wildfire incident summary record. The cleaned and merged Incident Status Summary
table is used to create the Wildfire Incident Summary table. This table extracts key elements from the individual sitreps to describe the fire and support high-level analysis across wildfire events. This information includes the cause, discovery date information, final acres, final estimated costs, injuries, fatalities, if evacuations recorded at any point during the fire, and the point of origin (latitude/longitude) for the fire. The summary also includes metrics for threats to life and property, structures damaged/destroyed, and firefighting response. In the cases of threats and response, the daily/regular Incident Status Summary (sitrep) values cannot be summed to get a true accounting of things like total personnel assigned. Instead, we sum these values across sitreps to provide indices representing the level of threat or response across the duration of the incident. We also identify peak volumes and corresponding days across the fire including peak personnel, peak aerial response, and peak fire spread. Finally, we calculate what we call the Cessation Date when the fire grew to within 95% of its final size. This metric is valuable because the containment date may be quite conservative, with incident management teams hesitant to declare a fire contained until there is very limited risk of further growth.
Linking to additional wildland fire datasets. The co-authorship of this paper made linking with the Fire Program Analysis Fire-Occurrence Database (FPA FOD 42,43 ) a logical extension of the ICS-209-PLUS dataset. The FPA FOD is a compilation of final fire reports from the federal, state, and local fire services. The wildfire records include a final determination for location (i.e., point of origin), cause, discovery and containment dates, and fire size. It also provides connectivity to MTBS 44,45 . The MTBS products, in turn, provide satellite-based wildland fire perimeter and burn severity data.
The Incident ID is used to join Wildfire Incident Summary records with records in the FPA FOD Extract file. This file is an excel spreadsheet published as part of the dataset using the naming convention FOD_JOIN_ {mmddyyyy}.xlsx. Matching records between the two datasets was an iterative process, with a focus on fires >100 acres having a clearly defined point of origin. At time of publication, 86% of all incidents in the Wildfire Incident Summary table (regardless of size) link to at least one FPA FOD record. As we continue to clean and refine the dataset we will publish incremental updates to this file 35 .
In the latest release we use a combination of the FPA FOD Extract file and the Wildfire Complex Configurations file. The columns used to link to the FPA FOD are described below (Table 11) and the columns added to the Wildfire Incident Summary are described in (Table 12).  Table 12 below describes each of the FPA FOD and MTBS fields in Wildfire Incident Summary Table. The FOD_FIRE_LIST is in JSON format. This provides both a human-readable and machine-parsable summary at the incident level. For example, the 1999 Arizona Jump Complex has three records in the fire-occurrence database and so the FOD Fire List contains three associated entries: [    www.nature.com/scientificdata www.nature.com/scientificdata/  Associations table is more concise, complete, and easier to decipher than records in the annual Complex Associations tables. In the current system there are more than two hundred thousand records in the latter. Multiple table joins are needed to understand how member fires and complexes relate one another and most of these complexes have no corresponding sitreps in the sitrep table. By including the FOD Extract File we increase the number of fire complexes defined by 31% and member fires by 69%.
The FPA FOD Extract file defines relationships between individual fires and fire complexes in the FPA FOD. The Wildfire Complex Configurations file defines relationships between fire complexes and member fires within the ICS-209-PLUS dataset. The Wildfire Complex Configurations file is published with source the files using the naming convention cpx-assocs{yyyytoyyyy}.csv. This file is copied verbatim as the foundation for the Wildfire Complex Associations table which is then linked to the FPA FOD and MTBS through the FOD_FIRE_LIST and MTBS_FIRELIST attributes respectively (see Table 13).
Linking the wildfire incident summary record to FIreD. Wildfire events are compiled from FIRED (Fire Events Delineation) 46,47 , a novel database of global wildfire perimeters derived from the Moderate Resolution Imaging Spectrometer (MODIS) burned area product (MCD64A1) (2001-2020) 48,49 . These satellite-derived wildfire events provide additional information not captured in MTBS, including smaller wildfires below MTBS's defined burn area thresholds (≤1000 acres in the Western US. and ≤500 acres in the Eastern US); estimates of the daily fire growth patterns such as the simple fire spread rate (acres/day); maximum single-day fire growth (acres); and the point of estimated ignition location (or the coordinate of the first identified burned pixel). Additionally, FIRED wildfire burn footprints can distinguish geographic areas within the burn footprint that may not have burned, a distinction that MTBS footprints do not always make. Linking Wildfire Incident Summary Records (hereafter "wildfire incidents") to corresponding FIRED events involves a multi-step approach that processes four   www.nature.com/scientificdata www.nature.com/scientificdata/ distinct subsets of wildfire incidents. At each step of the process, previously joined wildfire incidents are removed to avoid duplication.
First, wildfire incidents that are not associated with a complex (single incidents) are matched to their associated MTBS footprint using the linkage from the FPA FOD. Using these footprints, corresponding FIRED events are identified using a spatial overlay where the largest spatially overlapping polygon in the same fire year is retained. Matches were discarded if they exceeded an ignition date difference greater than 25 days and exceeded a difference of burned area greater than either 50,000 absolute burned acres or a 50% difference in final burned acres, relative to the FPA FOD's reported metrics. For each matched incident-FIRED event pair, the distance between wildfire incident Point of Origin (POO) and FIRED event centroid is calculated to inform the following steps.
Second, wildfire complex incidents are processed separately, using the Wildfire Complex Associations table (see section above) to retain the final complex incident report and discard member incident reports. Because satellite-derived events often map complexes as a single perimeter, we consider the final complex incident report to be the most accurate in describing overall area burned. This set of wildfire complex incidents are converted to spatial points using their POO coordinates. From each POO, a 25 km buffer is created. This buffer distance was selected based on the average and standard deviation of distance between wildfire incident POO and FIRED event centroids calculated from the first set of wildfire incidents described above. For each year, all FIRED events intersecting a given wildfire incident buffer are identified and filtered to retain events that match the spatial and temporal thresholds described above. This method often produces multiple corresponding FIRED events for each wildfire incident. In these instances, we retain the event which equals the minimum difference in area burned. To ensure high confidence in these buffer-based spatial joins, we then apply a more restrictive spatial threshold of 5,000 absolute acres or 35% difference in burned acres, removing potentially erroneous joins.
Third, unmatched incidents which do not have a link to the FPA FOD and which are not associated with a fire complex are processed using the same buffer-based methods and thresholds described above. After completing this process, we use a K-nearest neighbour approach and match all remaining wildfire incidents to their 50 spatially nearest neighbors that occurred in the same year and that met our established spatial and temporal thresholds. This subset of incidents includes those in which the POO coordinates may be further from a corresponding FIRED event than the buffer distance used in previous steps.
The process described above results in 14,796 incident-FIRED event pairs across the Continental United States (CONUS) and Alaska, representing 43.8% of total incidents (1999-2020). This important subset captures 81.4% of burned area, 91.4% of residential structures destroyed, and 86.3% of estimated suppression expenditures. This procedure provides burned area footprints for thousands of incidents unaccounted for by MTBS due to the dataset's acreage thresholds for inclusion. Through the FIRED linkage, we join 7,263 additional wildfire footprints to ICS records, which provides critical spatial details for the spatiotemporal linkage (details below).
We validated the linked incidents by predicting the burned area in acres reported by the Wildfire Incident Summary Records using the satellite-derived burned area in FIRED events and a linear regression model (R 2 = 0.95). This strong goodness of fit demonstrates a high confidence of correspondence between the two datasets. The code used to generate the ICS-209 to FIRED join is available as a public repository 40 . The following fields were added to the Wildfire Incident Summary Table (Table 14).
Assigning wildfire incident ids to spatial and temporal units. To facilitate analyses of wildfire impacts in conjunction with concurrent social and environmental phenomena, we link wildfire records from the ICS-209-PLUS to the relevant spatial units and temporal periods in which each incident occurred (ICS-209-PLUS spatiotemporal linkage). Each observation is uniquely identified by its GEOID (spatial unit identifier), INCIDENT_ID, quarter, and year (Table 15). The spatiotemporal linkage is available for three geographies (counties, census tracts, and census block groups) and assigns 97.9% of ICS-209-PLUS wildfire incidents to corresponding spatial and temporal units.
To produce the linkage, we exclude prescribed fires (Incident Type = RX) and incidents created to track multiple fires contained during the initial attack phase. We then assign each incident to corresponding spatial units based on each incident's available MTBS ID, FIRED ID, and/or point of origin. We use fire perimeters from the MTBS Burned Areas Boundaries Dataset released on April 28, 2022, and FIRED fire perimeter data 46,47 . If neither MTBS ID nor FIRED ID is available for an incident, it is assigned only to the spatial unit in which its reported point of origin resides. Boundaries for these spatial units are all taken from the year 2020 and do not  www.nature.com/scientificdata www.nature.com/scientificdata/ account for changes in spatial units over time 50 . The variable SPATIAL_DATA_ORIGIN (Table 15) indicates how each GEOID was assigned (by MTBS footprint, FIRED footprint, both, or only POO). Via the INCIDENT_ ID field, users can then join the spatiotemporal linkage to the full suite of incident variables contained in the ICS-209-PLUS incident tables.
The ICS-209-PLUS spatiotemporal linkage allows researchers to aggregate wildfire impacts across different spatial and temporal scales. This ability offers several advantages over one of the few national datasets that documents wildfire-related losses, the Spatial Hazard Events and Losses Database for the United States (SHELDUS) 51 . First, the spatiotemporal linkage provides finer spatial resolution than SHELDUS's county scale, including estimates for counties as well as for census tracts and census block groups. Second, the spatiotemporal linkage to the ICS-209-PLUS reports direct damage metrics (e.g., counts of structures destroyed) instead of monetary losses (e.g., dollar value of structures destroyed). Measures of direct damage avoid the property bias embedded in monetary loss metrics, in which physical damage to a higher-value property is weighted more heavily than an equal amount of physical damage to a lower-value property. Direct damage metrics further avoid temporal bias that can emerge in longitudinal hazard data, in which monetary losses are not adjusted for inflation over time, and thus appear higher in more recent reporting 52 .
When aggregating the spatiotemporally linked ICS-209-PLUS data, it is important for users to note that wildfire perimeters may cross multiple spatial boundaries, particularly among larger fires and for smaller spatial units (see Fig. 2 as an example). Similarly, the duration of a wildfire may cross temporal boundaries, burning over multiple quarters or years. A simple join between the spatiotemporal linkage and ICS-209-PLUS incident tables will not allocate variable values across spatial units or over temporal periods, and instead will yield fire-wide totals for each spatial and temporal unit in which a given incident occurs. For instance, if a wildfire burned across two adjacent census tracts and destroyed five buildings, the spatiotemporal linkage will associate each census tract with five destroyed buildings. Similarly, in the case of a wildfire that burned over the course of two quarters and destroyed five buildings, the spatiotemporal linkage will associate each quarter with five destroyed buildings. This approach means that caution should be used when summarizing aggregate values across spatial or temporal units, which, in some cases, can result in double-counting. We estimate that 8.3% of wildfire incidents occur in more than one county, 14.4% occur in more than one census tract, and 17.2% occur in more than one census block group. Additionally, we estimate that 6.9% of wildfire incidents occur in more than one quarter. Researchers need to make careful decisions about how to account for this characteristic of the data when using the spatiotemporal linkage.

Data records
The ICS-209-PLUS dataset spans 22 years from 1999 to 2020 and contains 187,160 Incident Status Summary reports for 35,170 thousand all-hazard incidents. The dominant hazard in the dataset is wildland fire (98.4%) with the remaining 1.6% spread across other hazards. The number of incidents is lower overall prior to the 2005 mandate but contains roughly the same distribution of fire and non-fire incidents as in subsequent years. Given the dominance of wildfire in the dataset, we created three tables specifically for wildland fire analysis: a wildfire-specific Incident Status Summary

It is important to note that the ICS-209-PLUS dataset represents a small but important subset of wildfires (1-2%). The patterns presented in this technical validation are only for large or otherwise significant incidents and not meant to be interpreted as holding for wildfires in general.
Wildfire distribution. Wildfires requiring ICS-209 reporting are most numerous in the interior of Alaska and continental US with hotspots in parts of California, the Northern Rockies, Northern Forests, and parts of the South-eastern U.S (Fig. 3). Fire reporting is notably limited in areas of the Central Midwest and North-eastern US (Fig. 3), and there were relatively few fires reported from in Hawaii and Puerto Rico (not shown).

Spatial distribution across key variables.
We examined the spatial distribution across six variables within the dataset: burned area, maximum fire spread rate, estimated incident management costs, total assigned personnel (described below), maximum structures threatened, and the number of structures destroyed (Fig. 4). The fastest-spreading fires were in the northern Great Basin area, along the border of Nevada and Idaho, where landscapes are dominated by fast-burning sagebrush and cheatgrass fuels. Additionally, the West experienced larger, faster fires requiring more resources and resulting in higher suppression costs. Suppression costs are an order of magnitude lower on average in the East versus the West, but societally impactful fires are not limited to the West (Fig. 4e,f). Smaller fires in the East threaten and destroy large numbers of homes. The allocation of firefighting personnel is greatest in California and the Pacific Northwest, with California committing the most resources regionally. This allocation of resources may help to lessen the number of homes damaged or destroyed, particularly across densely populated fire-prone landscapes. Where it may not reduce destruction is in the hottest and driest portion of Southern California. Wildfires requiring ICS-209 reporting have a presence in all but parts of the midwest and northeast. In the Appalachian Mountain range, the influence of human ignitions has led, in recent years, to large areas burned and associated ICS-209 reporting. More research is needed to better understand the factors driving these apparent patterns.   www.nature.com/scientificdata www.nature.com/scientificdata/ Summary statistics were generated across the national Geographic Area Coordination Center (GACC) boundaries (https://gacc.nifc.gov/) for the six key metrics plus fire count and mean fire size, and total personnel (Table 17). Alaska had the largest reported average maximum fire spread rate and average fire size, despite having second lowest fire count. The numbers reported for personnel have been summed across status reports. This roughly approximates the number of personnel shifts worked and threat accumulated over days rather than physical headcount or structures respectively. In CONUS, the Northwest region experienced the fastest fires, with an average maximum fire spread rate of 3,566 acres/day. The average fire size in the southern GACC (southeastern US) was 869 acres, or roughly 85% smaller overall. Though wildfires were substantially smaller in the southern GACC, it experienced the most total fire incidents (13,423) and third most destroyed structures (16,741). The state of California (Northern and Southern Ops) account for 45% of the total reported suppression costs, 49% of total personnel assigned, 53% of all structures threatened, 62% of all structures destroyed, yet only 15% of the total burned area across all GACCs. National interagency fire center (nifc) comparison. We compared the ICS-209-PLUS dataset with annual fire statistics provided by the National Interagency Fire Center (www.nifc.gov) across the same time period. The ICS-209-PLUS dataset captured approximately 2% (range 0.6% to 3.5% annually) of the population www.nature.com/scientificdata www.nature.com/scientificdata/ of wildfires, accounting for approximately 80% of the acres burned (range 53% to 98% of acres burned annually) and 79% of the suppression costs (range 51% to 140% of costs annually). These numbers are roughly in line with expectations, but the annual variability indicates that there are still significant outliers in the values that are skewing the results.

Dataset
Case study. The 2017 chetco bar fire. The following summary is derived from the Government Accountability Office (GAO) report to congressional requesters (2020) 53 . The Chetco Bar fire was ignited by lightning in the Kalmiopsis Wilderness of Oregon. It was reported by an airline pilot on July 12th, 2017. The fire was burning primarily in the burn scar of the 2002 Biscuit Fire in steep, difficult-to-access terrain. A small crew of firefighters were deployed by helicopter on July 12th and again on the morning of July 13th. Helicopters dropped water on the fire, but by the afternoon of July 13th firefighting resources were pulled out citing safety concerns and a low probability of containment. By July 20th the fire had grown to just over 300 acres forcing the closure of USFS roads and trails. In August, increasing temperatures and fire growth led to further closures and restrictions in wilderness areas. On August 17th, strong, hot, dry winds known as a Brookings or Chetco effect 54 began to drive fire growth and the fire crossed the www.nature.com/scientificdata www.nature.com/scientificdata/ Chetco River. On August 18th the fire grew dramatically in size and evacuations began near the town of Brookings, Oregon. Over the next few days evacuations continued to expand and by August 20th over 3,000 residents had been affected, six homes destroyed, and the fire grew to over 90,000 acres. By mid-to-late September cooler temperatures and increased moisture helped moderate the fire growth. It was declared fully contained on November 2, 2017. At its peak, over 800 firefighting resources were assigned to the fire, with over $69 million in suppression costs 55 .
There are 99 situation reports for the Chetco Bar fire spanning July 15, shortly after discovery, through final containment on November 2. Looking across key metrics on the 209 reports, one can see this fire narrative play out (Fig. 5). There is a close correlation across the peaks of key variables. There is a slight lag in the reporting of burned area, which may reflect delays in finalising estimates, particularly during the most volatile phases of large and fast-spreading fires. The onset of Brookings-effect winds is observable in the Max FSR (5b), driving a steep incline in fire growth (5c). These spikes correspond with structural threat (5d) and home destruction (5e). The estimated incident management costs rise steeply during the active growth phase of the fire and level out at $46 million dollars. These estimates are likely to be conservative, particularly on large, fast-moving events with values based on what is known at the moment rather than a clear accounting of final cost. It may take months for the final numbers to be tallied on large-scale responses, and therefore are expected to differ in situation reports and final fire-reporting systems. The allocation of firefighting personnel ramps up steeply to just over 5,000 on September 1st, and then begins to taper off later in September. The number of structures threatened rises quickly, with the first threat reported as 25 structures at 6 pm on August 19th. This value jumps to 2,505 at 7am the following morning with a total of seven structures destroyed including two residences. Structures threatened rose again on August 23rd to 4,503 at 6am and 5,503 reported at 7:30 pm, coinciding with new evacuation advisories.
The values logged for both total personnel and structures threatened may represent the most knowable information in-the-moment. The tracking of personnel happens in real-time during a shift, and concrete estimates of homes and areas threatened by a wildfire are integral to incident management decisions. It is also likely that there is a lag between reported resource needs and the deployment of new firefighting personnel. We hypothesise that the first report of structures threatened may serve as an accurate indicator of the onset of social disruption and that the growth and levelling out of personnel indicates that the fire is in its most acute and socially disruptive phase with the onset of demobilisation of resources indicating the easing of the threat posed by a fire. In this case, the first structural threat is recorded at 7 pm on August 17th and jumps to 3,418 on August 20th. This figure www.nature.com/scientificdata www.nature.com/scientificdata/ reaches its peak at over 12,000 structures on September 1st, where it remains for twelve days. The number of personnel steadily increases through mid-September when resources begin to steadily drop as conditions ease.
The perimeter and progression map (5a) visually depicts the rapid growth of the fire. The black diamond inside the perimeter at the top right is the Point of Origin latitude/longitude from the Wildfire Incident Summary Report. The final perimeter outline is from the MTBS dataset 44,45 and the fire progression is constructed using the MODIS burned area data 48,49 linked through the FIRED ID 46,47 . Because the MODIS data records the last burn detection for each pixel, the maximum growth is recorded on day five of the fire, which is earlier than the formal reporting of this same acreage on the situation report. This highlights two important points. Knowing that there is a likely delay in the reporting of acres burned on the sitreps, we may be able to make adjustments to these values as part of any analysis. Additionally, given the strengths and weaknesses of the values in ICS-209 reporting and the inherent biases, users should consider applications of the ICS-209-PLUS dataset in conjunction with other wildfire data products, such as those from remote sensing platforms, when intending to relate fire-spread events to socioeconomic impacts and firefighting response.

Usage Notes
Wildfire activity has increased in the US over the past several decades [5][6][7][8] , and the ICS-209-PLUS dataset affords valuable, science-grade, situation-reporting information regarding the development and consequences of large or otherwise significant wildfire events. This dataset provides a unique perspective that augments other important data sources, from after-action agency fire reports (FPA-FOD 42,43 ) to satellite-based detections of active fire and burned area (i.e., from Landsat, MODIS, VIIRS 46,48,56 ).
Although the incidents in this dataset represent only 2% of all wildfires, they account for approximately 80% of suppression costs. The daily/regular situation reports capture the best in-the-moment information on fire growth and behavior, environmental conditions such as weather and terrain, firefighting response, and the built and natural assets at risk. As exposure of highly valued resources and assets to wildfire increases, data on this important population of fires provides opportunities to understand relationships between a changing fire environment, incident response levels, and potential social and ecological impacts.
This first revision (second edition) of the data product meets several key objectives. It aligns the underlying data model across the three versions of the system, such that the data product now spans 1999 to 2020. Substantial cleaning efforts and filling of missing values allow for more robust analyses across the full domain. Additionally, the enhancements in the open-source code used to create this dataset is open source and can easily be extended to process and add subsequent years to the product. More work is needed to streamline the data processing and expedite publication phases. Instead of processing annually archived data, there is potential to build capability to avail research-ready situation reporting information in near-real time for analyses and integration with other in-the-moment data, including from surface-weather stations and remote-sensing platforms.
One of the longstanding barriers to using the ICS-209 electronic archive has been the data quality and lack of alignment across application versions. As in most human-generated, observational datasets, much of the data has been entered free form and is difficult to standardize for research purposes after-the-fact. Our goal was to strike a balance between manual inspection and cleaning and programmatic data compilation. We focused manual efforts on high-value fields like those related to fire location and outcomes like size and cost, for which we felt cleaning would provide the highest initial return. We automated the standardisation of values across the reporting systems and auto-filled empty values wherever it made sense. Given their intended in-the-moment uses, the data will likely continue to require post-hoc refinement for research purposes. Means for improved quality control and near-real-time compilation could greatly improve the scientific value of the data, as research interests and analytical needs grow within the fire service.
Our hope is that making this dataset available will lead to cross-sector and cross-discipline work resulting in greater understanding of our nation's wildfire activity and response patterns, and associated impacts. Understanding the causes and consequences of wildfire is a complex task requiring expertise across disciplines and potentially benefiting those in the forefront of fire management, climate science, environmental hazards research, policy making, planning and development. The ICS-209-PLUS dataset provides an important level of detail that can be used in parallel with other sources of information, filling in gaps and providing a completer and more nuanced picture of the relationship between characteristics of wildfire, incident response, and the causes and consequences of threatening wildfires across the nation. There is potential to address a critical information need as we work to understand trends and address the impacts and consequences of fire in an evolving physical and social landscape. This dataset has a great future research benefit, particularly if current limitations are addressed effectively through community-wide efforts to keep improving on this rich dataset in an open science framework.

Code availability
The source code used to create the ICS-209-PLUS dataset, the spatiotemporal linkage, and the ICS-FIRED linked database are publicly accessible on GitHub [39][40][41] .