Background & Summary

Energy systems are essential in the global effort to reduce energy use and greenhouse gas emissions for climate change mitigation1,2. Achieving the ambitious regional and national emission reduction targets relies on the rapid transformation and decarbonization of current energy systems, particularly in densely populated urban areas3. As home to more than half of the global population, urban areas are responsible for around 75% of global primary energy consumption and 70% of anthropogenic greenhouse gas emissions4,5. With ongoing urbanization amid climate change, cities are expected to face evolving challenges brought by changes in energy demand, energy supply, and their interactions3. Addressing these challenges requires robust and reliable urban energy system modeling. Such modeling also plays a crucial role in supporting the planning and development of new energy infrastructures, guiding energy policy decisions, and enhancing the resilience of urban energy systems to extreme weather events and climate change.

Reliable urban energy system modeling should be able to integrate energy demand and supply sectors with weather and climate conditions6,7,8. This integration is particularly important for components of the energy system that are susceptible to weather fluctuations. For example, the outputs of variable renewable energy systems (e.g., solar and wind power), as well as the cooling and space heating demands of buildings, are heavily influenced by weather conditions9,10,11,12. To assess the performance of energy system components under typical weather conditions, typical meteorological year (TMY) data have been widely used in existing modeling efforts, especially for the building sector13,14,15,16,17,18. TMY data comprise yearlong, hourly meteorological and solar radiation data that represent typical weather conditions over a long period of time for a specific location. They essentially consist of 12 typical months of data selected from different calendar years. As one of the standard input weather datasets for energy system modeling, TMY data have also been modified with climate projection data using methods such as morphing for simulations under future scenarios19,20.

However, TMY data are insufficient for capturing interannual variability and the frequency and intensity of peak loads and demands under extreme meteorological conditions, as suggested in previous studies7,21,22. Moreover, TMY data are becoming increasingly outdated; for example, the latest official TMY collection, TMY3, was developed based on data only up to 200523. Relying on TMY data for current urban energy system modeling can potentially lead to large uncertainties, particularly during peak periods. Alternatives aiming to capture extreme conditions, such as the extreme meteorological year, typical hot year, and typical cold year datasets, have been proposed using similar concepts of selecting representative months or years24,25,26. Although these datasets provide improved insight into extreme weather conditions, they still fall short of capturing the full spectrum of variability. In comparison, actual meteorological year (AMY) data, which include actual hourly weather data for specific locations and years, are indispensable for assessing the long-term trends, interannual variability, and historical extremes of energy systems22,27. Unfortunately, largely due to spatial and temporal data gaps in weather observations, long-term AMY data available for energy system modeling are still very rare.

Recent studies have attempted to address this data scarcity with gap-free, gridded AMY data derived from the results of numerical weather and climate simulations25,28. Nevertheless, this approach cannot fully substitute for observation-based AMY data for urban energy system modeling due to the potential biases and uncertainties inherent in these simulations. One of the notable limitations is the inadequate representation of urban climates (e.g., the urban heat island effect) in most regional and global climate models and reanalysis datasets29,30. To comprehensively understand the dynamics of urban energy systems and their dependence on local meteorological conditions and broader background climate, there is a pressing need for a long-term, gap-free, hourly weather dataset that covers a diverse array of urban areas.

Here we introduce the Historical Comprehensive Hourly Urban Weather Database (CHUWD-H) v1.0, a long-term (1998–2020), gap-free, and quality-controlled hourly weather dataset designed for urban energy system modeling in the United States (U.S.). CHUWD-H v1.0 includes data from 550 weather station locations, covering all 481 urban areas across the entire contiguous U.S. (CONUS). This database is primarily constructed from observations at ground-based weather stations, complemented by outputs from a physics-based solar radiation model and reanalysis data. The current version (v1.0) features 14 gap-free meteorological variables at hourly intervals. The accuracy of the gap-filled hourly data in CHUWD-H v1.0 exceeds that of other commonly used gap filling methods, as evidenced by a 10-fold Monte Carlo cross-validation. CHUWD-H v1.0 is publicly accessible through an online data repository31 and an interactive platform32. This database is expected to facilitate a broad spectrum of applications that extend well beyond urban energy system modeling.

Methods

Selection of weather stations

To develop a historical hourly weather database based on station observations for urban energy system modeling, the first step is to select representative stations. We selected a subset of weather stations from the TMY3 dataset, a widely used dataset developed by the National Renewable Energy Laboratory (NREL)23. This selection can facilitate comparisons of energy system modeling efforts based on the official TMY3 dataset and CHUWD-H v1.0. As the latest version of the TMY datasets, the TMY3 dataset encompasses 925 weather stations across the CONUS, constructed using measured and modeled data spanning either 30 years (1976–2005) or 15 years (1991–2005). Weather stations in the TMY3 dataset are categorized into three classes based on the quality of the source data: Class I stations have the lowest uncertainty, Class II stations have moderate uncertainty, and Class III stations have the most data gaps. Within the CONUS, there are 217 Class I, 564 Class II, and 144 Class III stations. This classification also indicates the general reliability and completeness of the weather observations at each station. To create a continuous, gap-free dataset, we prioritized Classes I and II stations in the station selection process. However, in the absence of Class I or II stations within a target urban area, Class III stations were also considered.

We then used the U.S. Census Bureau’s 2010 Topologically Integrated Geographic Encoding and Referencing (TIGER)/Line Shapefiles data33 to identify representative stations for urban areas. The U.S. Census Bureau delineates the boundaries of 481 densely developed urban areas (or “urbanized areas”), each with a population of at least 50,000. The urban boundary shapefile can be retrieved from the Census Bureau’s official website: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2010.html. We identified 392 weather stations located directly within these urban boundaries, and an additional 162 stations in close proximity to at least one urban area. We further inspected the operational status of individual stations using weather observations (see “Ground-based hourly weather observations” section), which is critical to the future updates of CHUWD-H. Four stations that discontinued weather observations after ~2010 were then excluded from the database. Consequently, CHUWD-H v1.0 covers 550 stations across the CONUS, including 181 Class I stations (141 within urban boundaries), 323 Class II stations (223 within urban boundaries), and 46 Class III stations (26 within urban boundaries).

When retrieving the observational records from the selected 550 weather stations, we found that several stations in the official TMY3 dataset23 have inaccurate latitude, longitude, and/or time zone. These inaccuracies could influence source data retrieval processes. To address this, we leveraged an updated version of the TMY3 dataset developed by Climate.OneBuilding.Org (https://climate.onebuilding.org/WMO_Region_4_North_and_Central_America/USA_United_States_of_America/index.html), which has undergone extensive verification34. Additionally, we cross-referenced station information from the Integrated Surface Database (ISD)35 and the National Centers for Environmental Information (NCEI)’s Climate Data Online (CDO)36. This allowed us to identify discrepancies in station locations, time zones, and elevations. In total, we corrected geographical locations and/or time zones for 43 weather stations. Figure 1 shows the spatial distribution of all 550 stations in CHUWD-H v1.0.

Fig. 1
figure 1

Spatial distribution of the 550 representative weather stations in CHUWD-H v1.0, color coded by classification according to the official TMY3 dataset. Class I stations have the lowest uncertainty, Class II stations have moderate uncertainty, and Class III stations have the most data gaps. Shaded areas in orange are urban areas with populations of at least 50,000. The basemap is World Terrain data from ArcGIS Pro.

Ground-based hourly weather observations

Hourly weather observations for representative stations were retrieved from the Integrated Surface Database (ISD) (https://www.ncei.noaa.gov/products/land-based-station/integrated-surface-database). ISD is a global database that includes hourly and synoptic surface observations from more than 100 original data sources35. As one of the flagship climate data products of the NCEI, ISD currently covers over 14,000 active stations. To ensure the high quality of data in ISD, various quality control measures were carried out, including validity checks, extreme value checks, internal consistency checks, and external continuity checks. These measures extend beyond the internal quality controls already present in source datasets (e.g., the Automated Surface/Weather Observing Systems; ASOS/AWOS). To further filter out duplicate values and sub-hourly data, we retrieved hourly air temperature, dew point temperature, sea level pressure, wind direction, wind speed, one-hour accumulated liquid precipitation, and six-hour accumulated liquid precipitation data from ISD-Lite for 1998–2021.

It is noteworthy that the same weather station can be listed under different station IDs over time. To minimize gaps in observational data, we compared the geographical locations, names, and elevations of stations in proximity to each target station to identify those that have undergone changes in name, ID, and/or location. For stations missing entire years of observations, we identified and used nearby stations with similar elevations (<100 m difference) and geographical conditions to fill in data gaps. The final dataset includes 312 stations merged from multiple station records, with nine of these stations partially supplemented with data from nearby stations. Note that data retrieved from ISD directly reflect raw observations, which may not be homogenized for some stations.

Given that most energy system models require meteorological data in local time for model input, we converted all hourly observations from Coordinated Universal Time (UTC) to local time. To maintain consistency with TMY3 data, we also removed February 29 in leap years from our database. We further converted the sea level pressure to surface pressure for each station using the hypsometric equation and the station’s elevation.

Radiation data from NSRDB

Solar radiation data were retrieved from the National Solar Radiation Data Base (NSRDB) (https://nsrdb.nrel.gov/), developed by the NREL37. This database provides 4-km resolution solar irradiation data at 30-min intervals covering the entire CONUS from 1998 to the present. NSRDB uses the two-step Physical Solar Model (PSM), which computes solar radiation from satellite data under both clear sky and cloudy conditions with the Fast All-sky Radiation Model for Solar Applications (FARMS)38. More specifically, FARMS integrates cloud properties from satellite data, aerosol optical depth from Moderate Resolution Imaging Spectroradiometer (MODIS) and Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) data, along with additional atmospheric and land surface data from multiple sources to calculate Global Horizontal Irradiance (GHI), Direct Normal Irradiance (DNI), and Diffuse Horizontal Irradiance (DHI). Evaluation against concurrent ground-based measurements suggests a mean bias error of ±5% for the estimated hourly GHI and ±10% for DNI39. We retrieved 30-min GHI, DNI, DHI, clear-sky GHI, clear-sky DNI, clear-sky DHI, and zenith angle from NSRDB for each station location, and converted these instantaneous values to hourly average (during the preceding hour) using the trapezoidal rule.

Reanalysis data from MERRA-2

We further used the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) dataset as an additional resource to fill gaps in weather observations. As a reanalysis dataset developed by NASA’s Global Modeling and Assimilation Office, MERRA-2 is generated with the Goddard Earth Observing System (GEOS) atmospheric data assimilation system40. It provides hourly data from 1980 to the present at a spatial resolution of 0.5° latitude × 0.625° longitude. The current version includes several major improvements over its predecessor (MERRA), such as the assimilation of aerosol observations, improved characterization of stratospheric processes, and better representation of glaciated land surface processes. However, the coarse native resolution of MERRA-2 is not suitable for direct station-level gap filling. Here we used hourly air temperature, dew point temperature, surface pressure, wind speed, and wind direction from downscaled 4-km MERRA-2 data developed by NREL (https://nsrdb.nrel.gov/). The high-resolution temperature and pressure data were downscaled from the native resolution of MERRA-2 using an elevation scaling, while the high-resolution humidity and wind data were downscaled with a nearest-neighbor approach37. It is noteworthy that the elevations of the downscaled 4-km MERRA-2 grids may slightly differ from those of station locations. Although the elevation discrepancy for most stations is less than 100 m, which is likely to have only marginal effects on meteorological variables, we specifically corrected the pressure data for two stations where the elevation difference from the nearest MERRA-2 grid exceeds 100 m.

Quality control and gap filling methods

While the hourly irradiance variables from NSRDB are gap free, substantial gaps are prevalent in the raw observations from weather stations. Figures 2 and 3 illustrate the distributions of data gaps in time series of air temperature, dew point temperature, surface pressure, wind speed, and wind direction. Most data gaps occurred in the years prior to ~2005. On average, missing observations account for 5.46 ± 9.81% (mean ± 1 standard deviation or SD) for air temperature, 5.60 ± 9.89% for dew point temperature, 14.23 ± 26.84% for surface pressure, 5.72 ± 9.70% for wind speed, and 9.01 ± 9.50% for wind direction over the entire 23-year time series. No clear variations in data gaps were observed among different climate regions. While all stations recorded some measurements for air temperature, dew point temperature, wind speed, and wind direction over the 23-year period, there are 28 stations where surface pressure data were almost entirely missing (data gaps >99.99%).

Fig. 2
figure 2

Percentage of missing hourly observations over the 23-year period (1998–2020) for (a) near-surface air temperature (Ta), (b) dew point temperature (Td), (c) surface pressure (Ps), (d) wind speed (WS), and (e) wind direction (WD) for all stations in CHUWD-H v1.0. Each dot represents an individual station. The basemap is the shaded relief map blended with a land cover palette from MATLAB.

Fig. 3
figure 3

Percentage of missing hourly observations by year and station for (a) near-surface air temperature (Ta), (b) dew point temperature (Td), (c) surface pressure (Ps), (d) wind speed (WS), and (e) wind direction (WD) for all stations in CHUWD-H v1.0. Note that weather stations are grouped according to the nine climate regions defined by the NCEI51. In each subplot, each row shows the percentage of missing hourly observations for a specific station across different years.

Despite the internal quality controls conducted in the original data sources and ISD, we identified several potential erroneous data points (outliers) in the observations, which could introduce uncertainties into the subsequent gap filling process. Therefore, we carried out additional controls to remove these outliers based on the variations in raw data from both ISD and MERRA-2. Specifically, the acceptable upper and lower limits for temperature and pressure data were established using the mean ± 5 SD of ISD and MERRA-2 data and the range of MERRA-2 data over a 3-month moving window41,42. Any temperature or pressure data points falling outside these limits were considered outliers and subsequently removed. A similar procedure was applied to raw wind speed data, although a broader 10 SD threshold was used43. Following this quality control procedure, up to 0.09% of the ISD raw data were removed for individual stations. Additionally, all negative wind speed observations were removed.

We developed a multi-step gap filling (MSGF) approach to fill observational data gaps for all stations. This approach leverages data from both ISD and MERRA-2, aiming to retain as much observational data from ISD as possible. We constructed locally adaptive regression models that integrate available data from the target weather station, its nearest station, and the closest MERRA-2 grid44,45,46. For each missing data point, we first trained a multiple linear regression (MLR) model using available data at the target station and its nearest station as well as data from the nearest MERRA-2 grid within a specified moving window centered around the missing data point. The size of the moving window is 30 days (~ a month), 90 days (~ a season), 1 year, or 3 years for air temperature, dew point temperature, and wind speed, but 1 year or 3 years for pressure due to larger data gaps. The selection of the window size was determined by two key criteria: the statistical significance of the MLR model, which must achieve a p-value < 0.05 in a two-tailed t-test, and the number of non-missing observational data pairs from both stations within the window, which must be ≥ 10%. If the trained MLR model was statistically insignificant or if over 90% of observational data pairs were missing in the longest window (3 years), we switched to a simple linear regression (SLR) model trained using only the target station and its nearest MERRA-2 grid data, again selecting the moving window size based on statistical significance and data availability. If neither MLR nor SLR models sufficed, we filled the gap with data directly from the nearest MERRA-2 grid. To further enhance the robustness of the regression models, we excluded calm wind observations prior to constructing regression models. As an example, Figure 4 illustrates the MSGF approach for gap filling air temperature data.

Fig. 4
figure 4

The multi-step gap filling (MSGF) approach to fill a missing air temperature data point at the target weather station. MLR and SLR models are multiple linear regression and simple linear regression models, respectively. For clarity, this flowchart omits connections from statistically insignificant results in the MLR or SLR models based on 30-day, 90-day, and one-year windows (i.e., “No” decision in the light pink box “Statistically significant”) to the data gap decision boxes for subsequent longer windows (light green boxes on the left for MLR and right for SLR).

Unlike air temperature, dew point temperature, surface pressure, and wind speed, missing wind direction data for each weather station were directly filled with data from the nearest MERRA-2 grid. Additional quality controls were applied to the gap filled data, such as adjusting dew point temperature higher than air temperature and correcting negative wind speed data. Furthermore, we derived hourly relative humidity from the gap filled air temperature and dew point temperature data47.

Data Records

The CHUWD-H v1.0 is publicly available through the Open Science Framework31. Hourly weather data from 1998 to 2020 for each station are organized into individual csv files for each year under individual project components, resulting in a total of 12,650 csv files following the naming convention “S******_year_Lat_***_Lon_***_State_Class.csv”. For example, the file “S690150_2020_Lat_34.29_Lon_-116.15_CA_II” stores hourly data in 2020 for Station (S) ID 690150 (Twentynine Palms), a Class II station located in California (CA) at latitude 34.29° (34.29°N) and longitude –116.15° (116.15°W). Each csv file contains an annual time series of 8,760 hourly data points across 26 variables. These include five date and time variables, 16 meteorological variables, and five gap filling flag variables that indicate whether data points for air temperature, dew point temperature, pressure, wind speed, and wind direction were gap filled.

Note that CHUWD-H v1.0 also includes three auxiliary precipitation-related variables: total precipitable water from MERRA-2, and observed one-hour and six-hour liquid precipitation depths, sourced from ISD with existing data gaps. Due to the absence of reliable hourly precipitation sources for gap filling station-scale data, the current version does not include gap-free hourly precipitation data. However, these variables are included to support potential future applications such as model evaluation and the further development of CHUWD-H.

Details of all variables, their units, and definitions are summarized in Table 1. Information on all 550 weather stations in CHUWD-H v1.0 is provided in "CHUWD-H v1.0 stations.xlsx", accessible via the same data repository31, with variables detailed in Table 2.

Table 1 Summary of variables in CHUWD-H v1.0.
Table 2 Definitions of variables in “CHUWD-H v1.0 stations.xlsx”.

Figures 5 and 6 show major hourly variables averaged over the 23-year period (1998–2020) for 550 stations in CHUWD-H v1.0. To make this database more accessible to both researchers and the general public, we developed an interactive webpage32 using ArcGIS Online (Fig. 7). This platform features clickable weather station locations, each linked to a pop-up box that provides detailed information about the weather station and download links for annual hourly weather data files.

Fig. 5
figure 5

Hourly (a) air temperature (Ta; °C), (b) dew point temperature (Td; °C), (c) relative humidity (RH; %), (d) atmospheric pressure (Ps; kPa), and (e) wind speed (WS; m s–1) averaged over the 23-year period (1998–2020) for all stations in CHUWD-H v1.0.

Fig. 6
figure 6

Hourly (a) global horizontal irradiance (GHI), (b) clear-sky GHI, (c) direct normal irradiance (DNI), (d) clear-sky DNI, (e) diffuse horizontal irradiance (DHI), and (f) clear-sky DHI averaged over the 23-year period (1998–2020) for all stations in CHUWD-H v1.0. The unit is W h m–2.

Fig. 7
figure 7

Data inspection and downloading platform for CHUWD-H v1.0, featuring (a) interactive station inspection, and (b) downloading of individual annual hourly weather files.

Technical Validation

In addition to thorough quality controls on both raw data and gap filled data, we carried out a 10-fold Monte Carlo cross-validation (MCCV)48,49 to evaluate the performance of the proposed MSGF approach against three commonly used gap filling methods. The alternative gap filling methods are: (1) Direct Replacement (DR), which fills missing data points with data from the nearest MERRA-2 grid; (2) an SLR model that uses the entire 23-year time series of data from both the target station and the nearest MERRA-2 grid; and (3) an MLR model that uses the entire 23-year time series of data from the target station, the nearest station (when data are non-missing), and the nearest MERRA-2 grid. Figure 8 shows the results of these four gap filling methods through 10-fold MCCV, evaluated by the Pearson correlation coefficient (r), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The MSGF approach consistently outperforms the other methods, achieving higher r values and much lower RMSE and MAE values. Specifically, the average MAE value is 1.04 °C for air temperature, 1.04 °C for dew point temperature, 0.07 kPa for surface pressure, and 1.13 m s–1 for wind speed (Table 3).

Fig. 8
figure 8

Results of 10-fold Monte Carlo cross-validation, evaluated by the (a,d,g, and j) Pearson correlation coefficient (r), (b,e,h, and k) Root Mean Square Error (RMSE), and (c,f,i, and l) Mean Absolute Error (MAE), for (ac) air temperature (Ta), (df) dew point temperature (Td), (gi) surface pressure (Ps), and (jl) wind speed (WS) using four gap filling methods: Direct Replacement (DR) with data from the nearest MERRA-2 grid, Simple Linear Regression (SLR) model constructed using 23-year data, Multiple Linear Regression (MLR) model constructed using 23-year data, and the multi-step gap filling (MSGF) approach proposed in this study. The box bounds the interquartile range divided by the median, with whiskers extending to ±1.5 times the interquartile range beyond the box. Circles are sample data points, and their distributions are represented by halved violin plots. Sample size N is 550 for air temperature, dew point temperature, and wind speed but 522 for surface pressure.

Table 3 Summary of cross-validation results (mean ± 1 standard deviation) for the multi-step gap filling (MSGF) approach. Note that units are for RMSE and MAE.

Usage Notes

The national coverage of CHUWD-H v1.0 will facilitate detailed hourly urban energy system modeling and enable cross-regional comparisons across cities under various realistic weather conditions, which will advance our understanding of how urban energy systems respond to extreme weather events and climate change. Originally developed for energy system modeling in urban areas, CHUWD-H v1.0 is also expected to support a wide range of applications in historical urban meteorological and climate studies, including the validation and evaluation of urban climate models. Moreover, it will serve as a valuable resource for future dates of CHUWD-H. Notably, parts of CHUWD-H v1.0 have been used in the first long-term, city-scale building energy consumption modeling across the entire U.S. to assess the impacts of climate change, population dynamics, and power sector decarbonization on urban building energy use12. Additionally, this database has supported analyses of casual interactions among U.S. cities during historical heat waves50.