RescueNet: A High Resolution UAV Semantic Segmentation Dataset for Natural Disaster Damage Assessment

Recent advancements in computer vision and deep learning techniques have facilitated notable progress in scene understanding, thereby assisting rescue teams in achieving precise damage assessment. In this paper, we present RescueNet, a meticulously curated high-resolution post-disaster dataset that includes detailed classification and semantic segmentation annotations. This dataset aims to facilitate comprehensive scene understanding in the aftermath of natural disasters. RescueNet comprises post-disaster images collected after Hurricane Michael, obtained using Unmanned Aerial Vehicles (UAVs) from multiple impacted regions. The uniqueness of RescueNet lies in its provision of high-resolution post-disaster imagery, accompanied by comprehensive annotations for each image. Unlike existing datasets that offer annotations limited to specific scene elements such as buildings, RescueNet provides pixel-level annotations for all classes, including buildings, roads, pools, trees, and more. Furthermore, we evaluate the utility of the dataset by implementing state-of-the-art segmentation models on RescueNet, demonstrating its value in enhancing existing methodologies for natural disaster damage assessment.


Background & Summary
In recent years, natural disasters have profoundly impacted various regions across the globe, primarily attributed to the escalating effects of climate change and other contributing factors.The impacts of these disasters are getting stronger and more lasting.Reducing economic losses and saving valuable human lives depends heavily on quick response from the rescue teams.Various computer vision techniques can significantly contribute to precise damage assessment by leveraging the visual elements inherent in imagery.Image classification, object detection, instance segmentation, and semantic segmentation represent distinct components within the field of computer vision.. Image classification is the task of assigning a label or class to an entire image.Object detection is another computer vision technique that works to identify and locate objects within an image or video.Specifically, object detection draws bounding boxes around these detected objects, which help to locate where said objects are in a given scene.Semantic segmentation is one of the most essential tools for scene understanding.Semantic segmentation performs pixel level classification of each object of an image with distinct boundaries.On the other hand, instance segmentation is a unique form of image segmentation that deals with detecting and delineating each distinct instance of an object appearing in an image.Recently, substantial progress has been achieved in the field of scene understanding in urban environments, primarily attributable to the application of deep learning methodologies.Several pioneering datasets like Cityscapes 1 , PASCAL VOC2012 2 , PASCAL Context 3 , and COCO Stuff dataset 4 are available and these large-scale datasets are helping in improvement of segmentation accuracy of urban scenes.Despite these advances in semantic segmentation, scene understanding after natural disaster remains challenging due to the absence of benchmark datasets.
Existent natural disaster datasets can be sorted into two classes.One is ground-level images [5][6][7] , and other one is satellite and aerial imagery [8][9][10] .The ground level images are collected from different sources including Virtual Disaster Viewer 6 , Open Images Dataset 6 , Google image search engine 6,7 , and social networks 5,11 .Weber et al. presented a large ground level post-disaster dataset for different incident classification like drought, wildfire, and snowstorm 12 .Images had been introduced to the AIDR system 11 by Nguyen et al. by collecting images from the social media 5 .Although these imageries are abundant, they lack geo location tags 13 and only provide classification labels (one label for entire image).
On the other hand, satellite and aerial imagery are collected from different satellites and remote sensing equipment.Notable works in this arena includes a post-tsunami aerial image dataset named ABCD (AIST Building Change Detection) 14 .This dataset helps in detection of flood affected buildings.Doshi et al. proposed a dataset 15 that combines SpaceNet 16 and DeepGlobe 17 .In order to assess the damages caused by hurricanes, Chen et al. proposed a dataset 8 by collecting images from two different sources including crowdsourced annotated data of DigitalGlobe satellite imagery and data collected by FEMA.A large dataset named xBD which consists of both pre-and post-disaster satellite images are proposed by Gupta et al. 9 for building damage assessment with four different damage categories.ISBDA (Instance Segmentation in Building Damage Assessment) is presented by Zhu et al. 13 which includes user-generated aerial videos collected from social media platform.
Besides satellite imagery, very few UAV based disaster datasets are also available.Kyrkou et al. proposed an image classification dataset named AIDER (Aerial Image Database for Emergency Response) 18 .AIDER consists of images from four different disaster events including Fire/Smoke, Flood, Collapsed Building/Rubble, and Traffic Accidents.Rahnemoonfar et al. presented a high resolution post-hurricane dataset named FloodNet 10 .This dataset provides pixel level annotation of nine classes for natural disaster damage assessment.The UAV images provided by FloodNet consists of flooded and non-flooded areas collected after hurricane Harvey.
In the field of natural disaster damage assessment, the research community currently faces two significant challenges.Firstly, there is a lack of comprehensive pixel-level annotation for post-disaster scenes.This issue arises due to the absence of datasets that provide thorough annotation of the entire scene at the pixel level.The few available datasets only offer annotation for a limited number of classes 9,13,19 .However, damage information for only a few classes does not provide a complete understanding of the scene.Other elements present in the images, such as roads, vehicles, damaged trees, and debris, play crucial roles in explaining the scene comprehensively and aiding in more accurate damage assessment.To address this gap, this paper introduces RescueNet, a low altitude and high-resolution natural disaster dataset.RescueNet provides pixel-level annotation for 10 classes, expanding across six distinct categories, including water, buildings, vehicles, roads, trees, and pools.The second challenge pertains to the classification of different damage levels within a category.For instance, after a hurricane or wildfire, a building or road can be damaged, but the extent of the damage is often not specified in most datasets.Moreover, a building can sustain different levels of damage, ranging from slight damage to complete destruction after a disaster.While xBD dataset 9 classifies buildings based on four different damage levels, it is a low-resolution satellite dataset, resulting in poor image quality .To address this limitation, RescueNet provides building segmentation labels with four distinct damage criteria, while ensuring the images are of very high resolution.Figure 1 illustrates some examples from the RescueNet dataset, showcasing the complexity of the scenes along with their corresponding annotations.Our annotation includes both classification and semantic segmentation labels on high resolution imagery.In addition to offering semantic segmentation annotation for different categories and sub-categories such as buildings and roads, RescueNet also provides three-class classifications for each neighborhood.These classifications include superficial damage, medium damage, and major damage.An initial version of RescueNet was published as HRUD 20 where only 1973 images were released.There are primarily four distinctions between HRUD and RescueNet: 1) the number of images has been increased from 1973 to 4494 in RescueNet, 2) the object class "Debris" and "Sand" (present in the HRUD) have been combined with the background class in RescueNet, 3) "Road" class has been divided into two classes namely "Road-Clear" and "Road-Blocked" in RescueNet, and 4) three image label neighborhood damage classifications have been introduced in RescueNet.A comparative study of various existing natural disaster datasets with RescueNet is presented in Table 1.As depicted in Table 1, it is evident that FloodNet and RescueNet are the sole datasets that offer comprehensive semantic segmentation labels for high-resolution UAV imagery.
In summary, the contributions of this paper are as follows: • Introduction of RescueNet, a high-resolution aerial imagery dataset captured using unmanned aerial vehicles (UAVs).
• Provision of high-quality annotation for 10 classes, including images from both affected and non-affected areas following Hurricane Michael.
• Annotation of building and road damages based on severity.Building damages are classified into four damage classes: superficial damage, medium damage, major damage, and total destruction.Roads are classified as either road-clear or road-blocked.
• Evaluation of various existing semantic segmentation methods on the RescueNet dataset, demonstrating its utility in future research on natural disaster damage assessment.

Data Collection
Hurricane Michael made landfall near Mexico Beach, Florida, on October 10, 2018, as a category 5 hurricane, one of the powerful and destructive tropical cyclones to strike United States since Andrew in 1992.The dataset was collected by the Center for Robot-Assisted Search and Rescue using small unmanned aerial vehicles (sUAVs) on behalf of Florida State Emergency Response Team at Mexico Beach and other directly impacted areas 22 after operating 80 flights conducted between October 11-14, 2018.The two most important features of this dataset are fidelity and uniqueness.Firstly, the data is collected by emergency responders during the response phase, utilizing small unmanned aerial vehicles (sUAVs).This ensures that the dataset aligns with current data collection practices and represents the information typically gathered during a disaster.Secondly, it is unique since it is the only known database of sUAV imagery for disaster.Note that there is other existing database of imagery collected using unmanned and manned aerial assets during different disasters, such as National Guard Predators or Civil Air Patrol.But compared to our dataset, those are collected using larger and fixed-wing assets that operate above 400 feet AGL (above ground level).While images were taken using DJI Mavic Pro quadcopters, two sets of videos were taken with Parrot Disco fixed-wing sUAV and one set at night with a DJI Inspire and thermal camera.But these videos are not included in the RescueNet.

Annotation for Semantic Segmentation
The V7 Darwin platform (V7Darwin) 23 is employed for the annotation of the dataset for semantic segmentation.The objective is to achieve comprehensive pixel-level annotation for each image in the dataset.To accomplish this, all objects present in the dataset are meticulously annotated.This allows us to gain a comprehensive understanding of the extent of damage caused by natural disasters.The objects annotated in the datasets are buildings, roads, trees, water, vehicles, and pools.The building class encompasses both residential and non-residential structures.In accordance with the FEMA guideline 24 , the building damages are further categorized into four levels: Building-No-Damage, Building-Medium-Damage, Building-Major-Damage, and Building-Total-Destruction.Table 2 provides a summary of the annotated polygons for these four classes.Although the FEMA guideline 24 includes scenarios where the conditions of buildings can be observed from all sides, aerial images provide only top views.Therefore, it becomes necessary to adapt the definitions based on these top views.Based on input from first responders, the building damage classes are defined as follows: • Building-No-Damage: This class is assigned when no damage is observed on a building.
• Building-Medium-Damage: This class is assigned when certain parts of a building are affected, requiring minimal repairs to make it habitable.For example, if the roof can be temporarily covered with a blue tarp, it falls under this classification.
• Building-Major-Damage: This class is assigned when the damage sustained by a building is significant enough to require extensive repairs, indicating substantial structural damage.
• Building-Total-Destruction: This class is assigned when two or more major structural components of a building collapse, such as basement walls, foundation, load-bearing walls, or the roof.
It is important to note that these definitions are adapted specifically for the top-view perspective provided by aerial images, taking into consideration the information and observations from first responders.The road annotations in RescueNet dataset are divided into two classes: Road Clear and Road Blocked.In the aftermath of a natural disaster, roads can become obstructed by various obstacles, including floodwater, sand, and debris from trees and buildings.These obstructed roads are challenging to access and are classified as "Road Blocked" in the dataset.On the other hand, roads that are not covered by any obstacles are classified as "Road Clear".The defining criteria for all classes, including buildings and roads, are summarized in Table 3.Several examples from the RescueNet dataset, along with their corresponding colored annotated masks, are depicted in Figure 1.In the first image, various undamaged buildings, vehicles, trees, and roads can be observed.The second image displays buildings with different levels of damage, alongside roads.Notably, the right side of the road is obstructed by debris, leading to its annotation as "Road Blocked," while the left side is annotated as "Road Clear."The third image showcases another damaged area with buildings classified into different damage categories, as well as a road.Due to the presence of debris, the road is annotated as "Road Blocked."Similarly, three more examples are presented in the third row, alongside their corresponding annotations displayed in the fourth row.

Annotation for Image Classification
Thus far, the discussion has primarily focused on the annotation process for semantic segmentation.However, in addition to pixel-level annotation, each image in the dataset is also classified into one of three classes: Superficial damage, Medium damage, and Major damage.This classification serves to represent the overall damage level of a specific area captured by the image.It serves as an image classification tag for each image.
The classification of an image is based on the extent of damage within the area covered by that image, encompassing both man-made and natural structures.If the image does not exhibit any damaged structures, it is classified as "Superficial damage".In the case where a few structures are damaged by the natural disaster, the image is labeled as "Medium damage".Finally, if the image contains at least one completely destroyed building or if approximately 50% of the area is covered with debris, it is categorized as "Major damage".

Quality Control
To ensure the quality of the annotation, a rigorous two-step verification process is employed for each image.An annotation guideline, which includes the definition of different classes and distinguishing features for various building damages as described in the "Data Annotation" section, is provided to the annotators.The annotation process involves pixel-level annotation of all objects present in the image, making it a time-consuming task that takes approximately one hour per image.
Quality control measures are implemented to maintain the accuracy and consistency of the annotations.Initially, each image is assigned to an annotator who performs the annotation.Once the annotation is completed, the image is passed on to a reviewer for verification.The reviewer meticulously examines the quality of the annotation, ensuring its adherence to the guidelines and overall accuracy.If any discrepancies or inaccuracies are identified, the image is returned to the annotator with detailed comments for further correction.This iterative cycle of review and correction continues until the reviewers are satisfied with the quality of the annotations.Through this meticulous review process, the dataset attains a high standard of annotation quality and consistency.The presence of a pool, regardless of whether it contains water, located adjacent to a building, is classified as "Pool".

Generation of Segmentation Mask and Classification Label
Annotation are performed on V7 Darwin Platform 23 .Annotation of each image are recorded in "json" data format.In the json files, objects of different classes are recorded as bounding boxes in dictionary format (key-value pair).Dictionary description of each object has several keys including polygons, name, and attributes.Different keys have different functionalities.Polygon acts as a bounding box of an object and contains corner points of that polygon.Each pixel inside of a polygon represents the corresponding class.The name denotes the class name of an object and the attributes key represents the sub-classification of that object class (if there is any).For example, if name denotes a "building" class, then attributes can denotes any of the four subclasses such as "No (Superficial) Damage", "Minor (Medium) Damage", "Major Damage", and "Total Destruction".In addition each json file contains a tag (another type of dictionary entry) which has a name key whose value starts with term "Neighborhood" to represent the overall damage classification of the image.For example, depending on the damage amount observed in an image, the name key can have one of the three classes: "Superficial Damage", "Medium Damage", and "Major Damage".
To generate masks for semantic segmentation task we utilize the information recorded in bounding box components of json files.As mentioned in the previous paragraph each json file contains bounding box dictionaries which keep records of object classification of each pixel.For an image we start with an empty mask and then update each pixel value by going through each bounding box of the corresponding json file.Following this procedure for each image, segmentation masks are generated for the entire dataset.The pixels with "Water" classes are labeled as 1, "Building No Damage" as 2, "Building Minor Damage" as 3, "Building Major Damage" as 4, "Building Total Destruction" as 5, "Vehicle" as 6, "Road-Clear" as 7, "Road-Blocked" as 8, "Tree" as 9, and "Pool" as 10.Rest of the pixels are considered "Background" and labeled as 0. On the other hand, to generate classification labels for image classification task we extract labels from the "Neighborhood" tag of the json file.The classification labels of images are recorded in ".csv" files where classes are represented as 0 when the classification label is "Superficial Damage", 1 when classification label is "Medium Damage", and 2 when the label is "Major Damage".

Semantic Segmentation
RescueNet 25 is a dataset collected from the areas affected by Hurricane Michael, utilizing UAVs.The dataset comprises multiple classes, including water, building-no-damage, building-minor-damage, building-major-damage, building-total-destruction, vehicle, road-clear, road-blocked, tree, and pool.Notably, RescueNet offers an extensive range of semantic labels specifically for buildings, based on their respective damage levels.Table 2 provides an overview of the dataset's building annotations, indicating the presence of 4,011 polygons labeled as building-no-damage, 3,119 polygons labeled as building-minor-damage, 1,693 polygons labeled as building-major-damage, and 2,080 polygons labeled as building-total-destruction.The distribution of pixels across different classes in the RescueNet dataset is visualized in Figure 2.These statistics and visualizations demonstrate the comprehensive coverage and diversity of the dataset, particularly in terms of building damage annotations.Some sample images from this category are shown in Figure 1.

Image Classification
In addition to pixel-level annotation (semantic segmentation), the RescueNet 25 dataset also includes image-level annotations.Each image in the dataset is categorized into one of three damage levels: superficial damage, medium damage, or major damage.In RescueNet, 1389 images are classified as "Superficial Damage", 1906 images as "Medium Damage", and 1199 images as "Major Damage".Some sample images from this category are shown in Figure 1.

Dataset Splits
We partitioned our dataset into three subsets: training, validation, and testing.However, it should be noted that the dataset includes both areas affected by disasters and non-affected areas.Some images depict areas covered in debris with a variety of buildings exhibiting different damage labels, such as superficial, medium, major, or total destruction.To ensure a balanced distribution of images from both affected and non-affected areas across the three subsets, we utilized the image classification of the dataset where all images of the dataset are categorized into one of the three categories: superficial damage, medium damage, and major damage.
From each category, 80% of the images were allocated to the training set, 10% to the validation set, and the remaining 10% to the test set.This distribution strategy ensured a representative distribution of images across the subsets.The training set consists of 3,595 images, the validation set contains 449 images, and the test set contains 450 images.The distribution of image classes (Superficial, Medium, Major) in the dataset is shown in Table 4.

Technical Validation Problem Details
The objective of semantic segmentation, one of the fundamental tasks in computer vision, is to assign label to each pixel of an image.RescueNet 25 consists of post-disaster images and provide pixel level annotation of 10 object classes.For any image in the dataset, the RescueNet has annotation of all objects present in the scene along with detailed damage levels of some of the classes such as buildings, and roads.
PSPNet architecture can be divided into two stages: feature map generation and pyramid pooling module.The first stage generate feature map from an input image either using transfer learning or from scratch using dilated convolutions.Dilated convolutions gather large size area information using smaller kernels for higher dilation rates by keeping dimension same as input image.In the pyramid pooling module, feature maps are average pooled at different pool size.The reason behind using different pooling scales is to correctly segment objects of all sizes.Since an image contains objects of various sizes ranging from small area to large area in different regions, pooling at different scales can segment objects with different sizes.DeepLabv3+ 27 is an encoder-decoder semantic segmentation model which upgrades DeepLabv3 30 by solving the issue of lowering of prediction accuracy and loss of boundary information due to multiple downsampling operations.DeepLabv3+ implements atrous convolution/dilated convolution to enlarge the field of view of the kernels and thus solving the issue of usage of regular downsampling operations.Moreover, application of depthwise separable convolution to both atrous spatial pyramid pooling and decoder modules results in faster segmentation network.Attention U-Net 29 introduces a novel grid-based attention gate module which allows attention coefficients to be more specific to local regions.This gating mechanism has been added to U-Net 31 as an extension which improves the model sensitivity to foreground pixels without requiring complicated heuristics.Finally, Segmenter 28 is a vision transformer based segmentation network which relies on the output embeddings corresponding to image patches and obtain class labels from these embeddings with a point-wise linear decoder or a mask transformer decoder.

Training Details
We implement the methods using PyTorch and use NVIDIA GeForce RTX 2080 Ti GPU and Intel Core i9 CPU as hardware.We use resnet101 as backbone for PSPNet and DeepLabv3+.We implement "poly" learning rate where base learning rate is 0.001.All the models use the following hyperparameters settings.Momentum is set to 0.9, weight decay to 0.00001, power to 0.9, and weight of the auxiliary rate to 0.4.For data augmentation we implement scaling, flipping, random shuffling, and random rotation.Data augmentation helps in avoiding overfitting.We resize the images to 713 × 713 during training.

Evaluation Metric
There are several evaluation metrics for computer vision like Accuracy, F1 Score, and Intersection Over Union (IoU).However, IoU is the most popular and precise metric for the evaluation of the performance of a segmentation model.IoU metric, also referred to as the Jaccard index, is essentially a method to quantify the percent overlap between the target mask and the predicted output.The IoU metric measures the number of pixels common between the target mask and predicted mask divided by the total number of pixels present across both masks.

Experimental Result Analysis
The experimental results are presented in Table 5.It is evident from the table that the attention-based method, Attention UNet 29 , achieves the best performance compared to the non-attention-based methods.Among the non-attention-based methods, PSPNet 26 , which implements a pyramid pooling architecture, demonstrates better performance than the atrous convolutional operation-based method DeepLabv3+ 27 .Performance of a transformer based method Segmenter have also been presented in Table 5 with two different backbones: ViT-Tiny and ViT-Small.From the experimental results it is evident transformer based model with heavier backbone achieves better results.All these results indicate that the proposed RescueNet dataset can be effectively utilized to extract scene information and detect objects after a disaster using deep learning methods.
RescueNet has also been explored from the perspective of transfer learning.We performed transfer learning on a similar semantic segmentation dataset named FloodNet by implementing DeepLabv3+ and PSPNet.In transfer learning framework, we first train the deep learning models on RescueNet and then use the learned weights to train on FloodNet.The results are shown in Table 6.From the results we can observe that transfer learning based training improves the performance of deep learning models on segmentation task.Therefore, besides sole usage of RescueNet, it can also be utilized to improve the accuracy of deep learning models on other datasets.

Usage Notes
There are several important practical and academic research applications of RescueNet: Damage classification of buildings: RescueNet provides explicit classification of building damages based on different damage levels.This information is crucial for first responders and rescue planners, as it allows them to make accurate decisions regarding the allocation of rescue efforts.
Road segmentation with debris: RescueNet includes segmented images of roads, classified as either "Road-Clear" or "Road-Blocked".The label "Road-Blocked" indicates that the roads are covered with flood water or debris from destroyed buildings or trees.Conversely, the label "Road-Clear" indicates that the roads are undamaged and not obstructed by any obstacles.A machine learning model with accurate segmentation of both categories of roads can greatly improve rescue planning and assist the rescue team in finding the most efficient routes to reach those affected by the disaster.
Usability in future disaster events: Given the occurrence of various natural disasters worldwide every year, it is crucial to accurately estimate the damage caused by these events.Understanding the different components of each affected area is imperative in this process.By training a machine learning model with RescueNet, it becomes possible to promptly identify damages in these areas, which can help save human lives and reduce infrastructure costs.Furthermore, RescueNet can be combined with other datasets and utilized in semi-supervised and self-supervised learning approaches to leverage its rich semantic labels, thereby enhancing the effectiveness of disaster assessment and response strategies.

Figure 1 .
Figure 1.Illustration of complex scenes of RescueNet dataset.First and third rows show original images, and the respective lower rows show the corresponding annotations (both semantic segmentation and image classification).

Figure 2 .
Figure 2. Pixel distribution of different classes in RescueNet.

Figure 3 .
Figure 3. Visual comparison of semantic segmentation algorithms on RescueNet test set.

Table 1 .
Overview of existing natural disaster datasets.

Table 2 .
Number of polygons of different buildings based on their damage levels.

Table 3 .
Instance types and classes in RescueNet.thatexhibitno or minimal visible damage on their roofs are classified as "Building No Damage".Building Minor DamageBuildings with partially damaged roofs, allowing the use of tarps for coverage, fall under the category of "Building Minor Damage".

Table 4 .
Distribution of classification labels (Superficial Damage, Medium Damage, Major Damage) in RescueNet.

Table 5 .
Per-class results on RescueNet testing set.

Table 6 .
Per -class results on FloodNet testing set based on RescueNet pretrained weights (✓ represents transfer leaning based training and ✗ represents non-transfer learning based training).