Background & Summary

About 30% of global final energy consumption and 27% of total energy sector emissions stem from building operations. After a short drop during the COVID-19 pandemic, emissions and energy consumption are both now above their pre-COVID level of 2019, showing that no late reduction trend has started1.

A major field for reducing energy consumption for building operations is the improvement of building envelopes, which is critical for reductions in heating and cooling intensity2. A thermal bridge is a discontinuity of a building’s envelope, whose thermal properties differ fundamentally from the thermal properties of the adjacent enveloping surface3. With increasing demands on the quality of building envelopes, the minimization of thermal bridges is becoming ever more important, since losses from thermal bridges can account for up to one third of a building’s transmission heat loss4,5. Beyond increased energy consumption, thermal bridges can lead to a wide range of problems, from the risk of condensation and mold infestation6, to a reduced comfort that occurs due to cold inner surfaces of a building7. In summer, thermal bridges lead to increased heat absorption by buildings and thus can increase the need for air conditioning3.

For the detection of thermal bridges of building envelopes, thermography can be reliably used8. In recent years, not only individual buildings, but also buildings in their urban context have gained importance for developing adequate retrofit strategies. The New Urban Agenda of the United Nations (UN) puts a spotlight on policies affecting urban structures at all appropriate levels recognizing that building design is one of the “greatest drivers of cost and resource efficiencies”9. When studying building stocks in cities, city districts, and villages, thermographic images can be collected with Unmanned Aerial Vehicles (UAVs/drones)10,11. Thermography with drones is especially advantageous because it saves time, resources, and is scalable for large areas compared to classical thermography with static cameras10. UAV-based thermographic systems are particularly beneficial when examining rooftops, since recordings with hand-held cameras are difficult. Previously, rooftop inspections with thermography had to be carried out on the basis of on-site inspections at night which are particularly labor-intensive, dangerous, and unable to achieve the same coverage feasible with drones12.

To evaluate large number of thermographic images collected in urban areas, the manual processing of images is time-consuming. The detection of thermal bridges can be automated, but is not trivial. Currently, approaches for automated thermal bridge detection work mostly with temperature threshold values and pattern recognition13,14,15,16. It is, however, difficult to find threshold values that can be generally applied to all types of thermal bridges17. Patterns and temperatures differ depending on the materials and building components where thermal bridges occur, on environmental conditions, and on recording settings. For example for windows, temperatures on thermographic images appear cooler due to high levels of reflection of glass surfaces18. Furthermore, misinterpretations, e.g. caused by open windows, can occur with simple threshold methods. Deep learning methods, which can overcome the aforementioned problems, may provide better results, but require annotated image datasets.

In this data descriptor, we present the Thermal Bridges on Building Rooftops (TBBR) dataset. To the best of our knowledge it is the first comprehensive aerial thermographic image dataset, which also provides height mapping information while also being fully annotated for district-scale segmentation of thermal bridges on building rooftops. It is organized and structured according to the FAIR principles19, i.e. being findable, accessible, interoperable and reusable.

The remainder of the data descriptor is organized as follows: the Methods section describes the environmental conditions and methodological approach in recording the TBBR dataset. Data Records details the organization of the data, including file formats, how the data has been preprocessed and curated, as well as how to obtain it from a publicly available data repository. In the Technical Validation section we highlight data quality aspects of TBBR. Finally, the Usage Notes sections sketches current and prospective use case scenarios for the data with an emphasis on (semi-)automated thermal bridge object detection and instance segmentation.


The raw images for our dataset were recorded with a Zenmuse XT2 visual (RGB) and a FLIR Tau 2 (thermal, camera (see Table 1 for details) on a DJI M600 drone ( They were recorded at flight heights between 60–80 m above ground with a flight speed of \(1\frac{{\rm{m}}}{{\rm{s}}}\) and contain GPS information. The images cover six large blocks of around 20 buildings per block recorded in the city center of the German city Karlsruhe with a total fly-over area of roughly 48500 m2 (see Fig. 1). Because of a high overlap rate of the images, the same buildings are on average recorded from different angles in different images about 20 times. All images were recorded during drone flights on Tuesday 19th March 2019 from 7am to 8am (UTC + 02:00). At this time, temperatures were between 3.78 °C and 4.97 °C, and humidity between 80% and 98%. There was no rain on the day of the flights, but there was \(2.3\frac{{\rm{mm}}}{{{\rm{m}}}^{2}}\) 48 hours beforehand. For all images, an exposure time of 1/100 s and ISO speed rating of 128 was used. For recording the thermographic images, an emissivity of 1.0 and an aperture of F1 was set. For the RGB images, an aperture of F1.8 was used. The global radiation during this period was between \(38.59\frac{{\rm{W}}}{{{\rm{m}}}^{{\rm{2}}}}\) and \(120.86\frac{{\rm{W}}}{{{\rm{m}}}^{{\rm{2}}}}\). No direct sunlight can be seen visually on any of the recordings. Further environmental conditions are shown in Table 2. We do not provide information on the recorded buildings’ internal temperatures, for estimates we refer readers to the corresponding German DIN standards20.

Table 1 Technical specifications of the cameras used in recording the TBBR raw data.
Fig. 1
figure 1

Geo-located map of drone flyover regions (left, WGS 84 coordinate system, source: Google Maps), DJI M600 drone (upper right), and Zenmuse XT2 camera with a FLIR Tau 2 thermal sensor (lower right). Dashed lines show the flight paths of the drone, polygons the photographed regions. Numbers correspond to identifier of each flight paths, e.g. 2 for Flug1_102 (see Data Records section below). Image source for the drone and camera: © DJI.

Table 2 Environmental conditions during the fly over on 2019-03-19 as measured by the closest weather station in Rheinstetten N 48°58′21.4″N 8°19′48.4″E (WGS 84 coordinate system, source: DWD OpenData at

The full set of raw images captured contained a total of 5698 images before preselection21. Preselection involved the removal of all blurry images, e.g. due to rapid movement or turning of the drone, and all images containing no visible thermal bridges. After preselection a total of 926 images remained.

The RGB and thermal drone images were fused with a computed height map. All images were converted to a uniform format of 4000 × 3000 px, aligned, and cropped to 3370 × 2680 px to remove empty borders. The annotations only include thermal bridges that are visually identifiable with the human eye. Because of the aforementioned image overlap, each thermal bridge is annotated multiple times from different angles. For the annotation of the thermal images the image processing program VGG Image Annotator from the Visual Geometry Group, version 2.0.1022, was used. The thermal bridge annotations are outlined with polygon shapes. These polygon lines were placed as close as possible but outside the area of significant temperature increase. If a detected thermal bridge was partially covered by another building component located in the foreground, the thermal bridge was also marked across the covering in case of minor coverings. Adjacent thermal bridges, which affect different rooftop components, were annotated separately. For example, a window with poor insulation of the window reveal located in the area of a poorly insulated roof is annotated individually. There is no overlap between annotated areas. While each image contains annotations, they also include thermal bridges present that are not annotated due to not being clearly identifiable, e.g. too small for accurate identification or unclear due to the camera perspective.

Image preparation

The image registration and alignment procedure is shown in Fig. 2. The procedure involves three main steps:

  1. 1.

    distortion correction,

  2. 2.

    registration and alignment,

  3. 3.

    cropping and stacking.

Fig. 2
figure 2

Image registration and alignment procedure.

The distortion correction procedure used was that established in previous works23,24. In short, a reference image was used to determine distortion coefficients, cv2.getOptimalNewCameraMatrix() to find a new camera matrix, and cv2.undistort() to correct distortion. All mentioned processing functions are part of the computer vision programming library OpenCV25.

Image registration and alignment was then performed by transforming the RGB and height map images onto the thermal images, as the annotation of thermal bridges was performed on these. A homography matrix was calculated using a total of 316 coordinate pairs from 21 RGB and thermal images. This homography matrix was then used to transform all RGB images in the dataset. Since the height map was created from the RGB images, we also used this homography matrix to transform the height map images.

The final cropping and stacking was performed to create the 5-channel images of the TBBR dataset, output in the NumPy format26. Images are cropped to 3370 × 2680 px to remove large black borders present in thermal images, and subsequently stacked into the channel order [B, G, R, Thermal, Height].

Computation of the height map

Due to the high overlap of images, we can extract similarities from feature points identified in each image and conduct photogrammetry. Photogrammetry allows estimation of the three-dimensional coordinates of points on an object in a generated 3D space involving measurements made on images taken with a high overlap rate. Therefore, we can use this technique to create a 3D point cloud model of the recorded region.

We used the ContextCapture software to perform photogrammetry on the TBBR dataset. ContextCapture provides users with intermediate information necessary to obtain each image’s estimated 3D coordinates and orientation23,24. This information allowed estimation of the distance between points in 3D and 2D spaces and to project points from the 3D to the 2D space to generate the height maps. The resulting 2D height map image pixels show the z-axis value (vertical height) of the corresponding 3D point cloud model points, normalized to the 8-bit range of the lowest 3D model point (0) and the drone (255).

Data Records

The Thermal Bridges on Building Rooftops (TBBR) data is publicly available on Zenodo27 and is licensed under Creative Commons Attribution 4.0 International ( The 926 images in the dataset are made available as a series of compressed archive files totaling 68.5GB. Each compressed archive file corresponds to one of the six flight paths, named Flug1_100 to Flug1_105 respectively (the word “Flug” means flight in German). The archives contain NumPy26 files (one per image) of shape (2680,3370,5), where the final dimension is the color channel in the format [B, G, R, Thermal, Height]. An example image (Flug_100, ID: 523) is depicted in Fig. 3. Archives were compressed using ZStandard compression28. They can be decompressed by utility software programs, e.g. tar or unzstd. Corresponding annotations are provided in the COCO JSON format29, which were automatically generated by the VGG Image Annotator.

Fig. 3
figure 3

Example image from the TBBR dataset (Flug_100, ID 523) showing the different channels, RGB (left), thermal (center), and height map (right), including overlaid annotations.

One of TBBR’s main design objectives was to facilitate (semi-)automated thermal bridges pattern detection algorithms30 (see Usage Notes). In accordance, the data is pre-split into train and test subsets with 723 (5614) and 203 (1313) images (annotations), respectively. There is one annotation COCO JSON for each subset, i.e. one for training (Flug1_100Media to Flug1_104Media) and one for test (Flug1_105Media) data. The latter block is used as a hold-out test dataset to standardize out-of-sample generalization performance assessment.

The experimental metadata was structured with the Spatio Temporal Asset Catalog (STAC) ( specification family. This specification is used to provide a standardized way for describing geo-spatial assets. It defines related JSON object types of Item, Catalog, and Catalog, extending Collection as the basis. Moreover, STAC objects can be extended with other specifications and enable a mechanism to provide additional metadata. Such an approach addresses the relevance for a common understanding of experimental metadata, which is ideally a widely accepted standard31.

The STAC Collection JSON object Flug1_collection_stac_spec provides information about the recorded images and the environmental conditions during recordings. It also contains information about the overall bounding box of the entire area in which images were recorded. It links to related STAC Item JSON objects containing information about the recorded city blocks and the cameras. The objects for the six flight paths, i.e. Flug1_100_stac_spec, Flug1_101_stac_spec, Flug1_102_stac_spec, Flug1_103_stac_spec, Flug1_104_stac_spec, Flug1_105_stac_spec, contain the GeoJSON32 geometry of the respective block and the corresponding bounding box.

The objects containing the camera information, named Flug1_camera1_stac-spec for the RGB camera and Flug1_camera2_stac-spec for the Thermal camera, are based on an existing STAC extension for camera related metadata. All STAC Item objects have a link to the Flug1_collection_stac_spec Collection object.

Metadata of the archived NumPy files for each image was structured using the Data Package schema from the Frictionless Standards ( This standard describes a collection of data files. Therefore, metadata about all containerized NumPy files of the six flight paths is provided within a JSON-based file, named Flug1_100-105_frictionless_standards.

All files are represented in a standardized way as FAIR Digital Objects (FAIR DOs) to enable machine actionable decisions on the data in the spirit of the FAIR principles33. This representation further facilitates reproducibility of experiments performed using TBBR and the detection of data errors34. Thus, each file deposited in Zenodo ( was assigned a Persistent Identifier (PID), which is resolvable with the Handle.Net Registry (HNR) ( The full list of PIDs are listed in the TBBR Zenodo dataset description27.

Technical Validation

The visual identification process and description of thermal bridges on building rooftops was based on typical patterns described in German DIN standards35,36,37 and thermal infrared inspections38. We note, however, that the interpretation of thermal images for building audits is currently always performed by human operators, which involves a high level of subjectivity13.

Thermal bridges occur on different parts of rooftops. Table 3 provides an overview about the different roof types and rooftop components where thermal bridges were annotated.

Table 3 TBBR annotation and component overview.

All preselected images were first manually annotated by a single industrial engineer. Then, following the two-person principle, all annotations were subsequently reviewed independently by an expert supervisor and corrected when necessary.

We qualitatively compare the distributions of thermal and height map values of thermal bridges and background between the train and test subsets. Figure 4 shows the histograms of both distributions within their 8-bit channel ranges of [0,255]. As expected, we observe a uniform distribution of thermal values across background pixels, while there is a distinct peak in warmer pixels for thermal bridges. Similarly, we see the presence of thermal bridges on rooftops only being reflected in the large height map values of thermal bridges, while background pixels are distributed uniformly both at the building level, and to a lesser extent at street level.

Fig. 4
figure 4

Histograms of thermal (left) and height map (right) pixel values of thermal bridges and background for both the train and test subsets within their 8-bit channel ranges of [0, 255]. Note that the height map values have been truncated slightly above their maximum at 170 for visual clarity. The zero valued pixel peaks arises from slight (~20 pixels) black borders remaining on the right side of images after cropping.

To quantitatively compare annotated distributions, we use scale invariant feature transform (SIFT) descriptors39 which has been shown to have a good general robustness across a range of image transformations40, e.g. affine transformations, scale changes, and rotations, making it an appropriate comparison for thermal bridge images of rooftops from various distances and angles. Figure 5 shows the average Euclidean distances between all 128 SIFT descriptors for annotated thermal bridges and background pixels across the train and test subsets. We observe a small distance between like classes across both train and test subsets, and larger relative distances for unlike classes, indicating that annotated regions contain distinct features from background in a consistent manner.

Fig. 5
figure 5

Euclidean distances between SIFT descriptors for thermal bridges and background annotations between train and test subsets.

Usage Notes

The annotation files contain relative paths to the NumPy files. We recommend the folder structure shown in Fig. 6 for usage of TBBR in conjunction with computer vision libraries such as Detectron241 or MMDetection42, or with the provided TBBRDet library (see Code Availability).

Fig. 6
figure 6

Recommended folder structure for TBBR dataset.

For image analysis pipelines we recommend to standardize the images, i.e. center it to 0 mean with a standard deviation of 1, to make the different channel ranges of the image data comparable:

$${Z}_{(w\times h,c)}=\frac{{I}_{(w\times h,c)}-{\overline{I}}_{(c)}}{\sigma {(I)}_{(c)}},$$

where Z is the transformed data, I the input images, overlines are mean values and σ the standard deviation, subscripts denote shapes of the data. For ease-of-use, we have precomputed the resulting values:

$${\overline{I}}_{(5)}=[130.0,135.0,135.0,118.0,118.0]\quad \quad \sigma {(I)}_{(5)}=[44.0,40.0,40.0,30.0,21.0].$$