North American historical monthly spatial climate dataset, 1901–2016

We present historical monthly spatial models of temperature and precipitation generated from the North American dataset version “j” from the National Oceanic and Atmospheric Administration’s (NOAA’s) National Centres for Environmental Information (NCEI). Monthly values of minimum/maximum temperature and precipitation for 1901–2016 were modelled for continental United States and Canada. Compared to similar spatial models published in 2006 by Natural Resources Canada (NRCAN), the current models show less error. The Root Generalized Cross Validation (RTGCV), a measure of the predictive error of the surfaces akin to a spatially averaged standard predictive error estimate, averaged 0.94 °C for maximum temperature models, 1.3 °C for minimum temperature and 25.2% for total precipitation. Mean prediction errors for the temperature variables were less than 0.01 °C, using all stations. In comparison, precipitation models showed a dry bias (compared to recorded values) of 0.5 mm or 0.7% of the surface mean. Mean absolute predictive errors for all stations were 0.7 °C for maximum temperature, 1.02 °C for minimum temperature, and 13.3 mm (19.3% of the surface mean) for monthly precipitation.

the 20th century; 1,221 stations were available from the U.S. Historical Climatology Network 6 , with an additional 46 stations for Alaska, and between 81 and 1742 stations from the Meteorological Service of Canada (from 1901 to 1993). The number of U.S. stations available in 2006 increased from over 5,000 stations between 1951 to over 7600 stations between 1971 and 2000.
The RTGCV, a measure of the predictive error akin to a spatially averaged standard predictive error estimate, averaged 0.94 °C for maximum temperature, 1.33 °C for minimum temperature, and 25.2% of the surface mean for precipitation from 1901 to 2016. RTGCVs for the current models were lower than those from 2006 4 for maximum temperature (1.03 °C) and precipitation (>30%), and similar for minimum temperature (1.3 °C). MAEs (all stations) averaged 0.71 °C (minimum temperature), 1.02 °C (minimum temperature), and 13.31 mm, or 19.3% of the surface mean (precipitation). While differences exist in methodologies between the 2006 and current models, MAEs for precipitation were typically more than 30% of observed precipitation for the 2006 models, considerably larger than those presented in this study (under 20%).
Much of the improvement in the current models is due to efforts by NOAA and other agencies to rescue and restore historical temperature and precipitation records, as well as improved quality control processes. Northam "j" records, which have been subjected to a homogenization process, identified observations failing one or more quality control tests. The models presented in this paper also benefited from more systematic anomaly detection using studentized residuals. Despite these improvements, there are regional variations in model predictive accuracy, with coastal (Pacific and Atlantic) stations having the highest levels of predictive error for precipitation, particularly in the winter.

Methods
Source data. We used the North American Dataset ("Northam") from the National Oceanic and Atmospheric Administration's (NOAA's) National Centres for Environmental Information 2 . The Northam data are generated from the Global Historical Climate Network-Monthly (GHCN-M) dataset 7 . Northam has been the calibration dataset for the U.S. Historical Climate Network (USHCN) since version 2.
We downloaded version "j" of Northam. Northam version "j" values were subjected to a pairwise homogenization algorithm described in Menne and Williams 8 . The nature and quality of these homogenized data have been analysed in numerous published articles [9][10][11] . GHCN-M quality control procedures are documented in Lawrimore et al. 7 . Observations flagged by NOAA through this process as having one or more quality control issues 7,12 were dropped from our analysis 13 . A description of the GHCN quality data flags ("QFLAG"s) is provided at https:// www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt. Figure 1 is a map of temperature and precipitation stations used for the analysis (see also 13 ). Figure 2 illustrates the number of stations by variable by country by year. The number of Canadian precipitation station records increased from 528 in 1901 to over 2,000 stations from 1971 to 1993. The number of Canadian station records then declined to under 1,000 stations by 2012. In contrast, U.S. records increased from just over 3,000 in 1901 to over 8,000 from 1950 onwards. Spatial modeling. We used thin plate smoothing spline algorithms as implemented in ANUSPLIN 3 .
ANUSPLIN is a suite of FORTRAN programs under development for more than 25 years for applying thin plate spline data smoothing techniques to multi-variate data. ANUSPLIN has been used by researchers around the world [14][15][16] . Here we only provide a brief description of thin plate splines. Readers are directed to Wahba 17 for a more detailed description of thin plate splines. Hutchinson 18 gives a general model for a thin plate spline function f fitted to n data values z i at position x i : in which the x i refer to the independent variables, in this case, longitude, latitude and elevation multiplied by a factor of 100. Multiplying elevation by a factor of 100, which reflects the relative horizontal and vertical scales of atmospheric dynamics 19 , has been shown to improve predictive performance as demonstrated by Hutchinson 20 and Johnson et al. 15 . The ε i are mean random errors that represent both measurement error as well as model deficiencies, reflecting localized effects below the resolution of the data network such as cold air drainage. Precipitation values were subjected to a square root transformation prior to surface fitting. The square root transformation reduces skewness of the precipitation variable 19 , making the application of a fixed level of smoothing more consistent between small and large precipitation values across the data network. Tait et al. 21 have confirmed that the square root transformation can yield a significant reduction in daily precipitation interpolation error. This transformation also makes the detection of data errors more consistent between small and large precipitation values.
The model solution was obtained by minimizing the generalised cross validation (GCV) 4 . The GCV is calculated by implicitly removing each data point and summing a suitably weighted square of the difference of each omitted data point from a surface fitted to all remaining data points 17 .
For large datasets, ANUSPLIN uses a sample of stations, called knots, to construct the thin plate smoothing spline surfaces. The use of knots reduces the computational complexity while still making use of every data point to calculate the fitted surface 3 . In this case, approximately 40% of data points were selected as knots.
In the course of modeling, we reviewed the signal, which is a diagnostic metric produced by ANUSPLIN. The signal statistic ranges between zero and the number of knots. Hutchinson and Gessler 22 suggest that the signal should generally be no greater than about half the number of selected data points. Models with a good signal provide a balance between data smoothing and exact interpolation, while models with signals approaching the number of data points result in a rougher surface that approaches an exact interpolation of the source data. Exact interpolations reflect a model that is less robust, particularly in regions with few stations.
www.nature.com/scientificdata www.nature.com/scientificdata/ During the course of model development, root generalized cross validation (RTGCV) values in the log files output by ANUSPLIN's SPLINE program were also reviewed. The RTGCV, the square root of the GCV described above, is an estimate of predictive standard error. Of note, the RTGCV is a conservative measure of standard error, because it includes data error as estimated by the ANUSPLIN program 3 .
ANUSPLIN flags data values that exceed a user-set threshold for studentized residuals, greatly improving the analyst's ability to systematically detect anomalous recorded values and potential errors. For the current analysis, we examined flagged cases with a studentized residual of greater than 3.71. The probability of exceedance corresponds to a student's t distribution 23 . Following a comparison of flagged values against observations from neighbouring stations, we removed cases with studentized residuals exceeding 3.71; 0.05% of stations for minimum temperature, 0.15% for maximum temperature, and 0.17% for precipitation 13 . While these observations were not used to develop the gridded estimates, they were still retained to assess the quality of model predictions.
The CV estimates taken from SPLINE's output point cross validation file were used to calculate the mean error (ME) statistics presented in this study, calculated as the cross-validation estimate less recorded station values. In addition, the mean absolute error (MAE) was calculated for a set of 160 stations 13 for January, April, July, and October at 5-year intervals from 1905 to 2015. This set of stations was selected to reflect a representative and high quality sample compared to using all stations, which would under-represent northern stations. Please see https:// www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt for a list of all GHCN stations (including those outside of North America) and their metadata. www.nature.com/scientificdata www.nature.com/scientificdata/ As a final summary, MEs and MAEs were calculated for all stations by season (winter: December, January, and February; spring: March, April, and May; summer: June, July, and August; and autumn: September, October, and November). SAS software, Version 9.4 of the SAS System for Windows, was used to calculate differences between the predicted and recorded values, as well as to conduct correlation (PROC CORR) analyses on MEs and MAEs for the set of 160 test stations. Error maps were created in ArcGIS 24 .

Data Records
Monthly grids of mean maximum/minimum temperature, and total precipitation were generated between 1901 and 2016 using a 60 arc-second (approximately 2 km) Digital Elevation Model covering the continental US and Canada are archived at the World Data Center for Climate (WDCC) at DKRZ 1 . Post-2016 grids are regularly published to the same DOI, and may also be obtained by contacting the corresponding author.
These monthly historical spatial models cover the geographic area from −168° to −52° longitude, and from 25° to 85° latitude from 1950 to 2016. Because of the small number of northern weather stations from 1901 to 1949, monthly historical models over this time cover a reduced area (−168° to −52° longitude, and from 25° to 60° latitude).   www.nature.com/scientificdata www.nature.com/scientificdata/ The average RTGCV was 0.94 °C for maximum temperature, lower than that for minimum temperature (1.33 °C). The larger errors for minimum temperature are consistent with previous work and reflect shorter length scales and larger observational/representativeness errors for this variable compared to maximum temperatures. As noted by Hutchinson et al. 18 , maximum temperature patterns are strongly controlled by ground elevation intersecting linear atmospheric lapse rates but minimum temperature patterns are controlled by additional processes, including cold air drainage which inverts local lapse rates, particularly in winter months 25,26 . Errors for maximum temperature are slightly larger in mid-summer, reflecting the greater variability of the higher temperature values 19 .

Technical Validation
RTGCVs for minimum temperature ranged from 1.19 °C in April to 1.4 °C in October. RTGCVs for maximum temperature were largest in July (1.03 °C) and lowest in October (0.88 °C). For precipitation, the RTGCVs were largest in the summer (July and August), similar to the pattern reported by McKenney et al. 4 This pattern reflects greater spatial complexity of convective rainfall compared to frontal precipitation occurring in winter months. Consistent with Hutchinson et al. 19 , we found lowest predictive errors in autumn for maximum temperature and precipitation, "consistent with the large well-organized synoptic systems that prevail in this season" (p. 725). www.nature.com/scientificdata www.nature.com/scientificdata/ The signal to number of knots ratio for the current models ranged between 47.0% and 64.7% for minimum temperature, and 39.5% to 60.0% for maximum temperature. For total monthly precipitation, the signal to knots ratio showed a wider range from 26.4% to 62.9%, but no problematic surfaces, as defined by Hutchinson & Gessler 13,22 .
Figures 3 to 5 illustrate the MEs (Estimated less Recorded) and MAEs for January, April, July and October for 160 test stations every five years from 1905 to 2015. Maximum temperature model errors were typically between -3.0° and 3.0 °C (Fig. 3). Minimum temperature predictive errors were larger than those for maximum temperature (Fig. 4). Model errors for total monthly precipitation (Fig. 5) were largest at the coasts in January. July precipitation errors were substantially smaller, falling mostly between ±10 mm as compared to ±50 mm for January precipitation errors. Larger model errors were evident along the Pacific and Atlantic coasts for January precipitation, linked to heavy and highly variable winter precipitation events.
Across the entire dataset, the MAE for maximum temperature was 0.71 °C, compared to 1.02 °C for minimum temperature. MAEs were largest in summer for maximum temperature, and in winter for minimum temperature as shown in Table 2. MAEs for precipitation were largest in the summer (21% of the surface mean) compared to www.nature.com/scientificdata www.nature.com/scientificdata/ spring (17.9%). The spatial models exhibited a negative (or dry) bias relative to recorded values for precipitation of 0.5 mm or 0.7% of the surface mean. In comparison, MEs for temperature variables were less than 0.01 °C. Figure 6 illustrates the MEs (Estimated minus Recorded) and MAEs for January, April, July and October for 160 test stations every five years from 1905 to 2015. The MEs for maximum temperature models ranged between 0.0° and 0.3 °C (Fig. 5a), and from −0.3° to 0.1 °C for minimum temperature (Fig. 5b). The MAEs varied by year from 0.9° to 1.2 °C for minimum temperature, and from 0.5° to 0.9 °C for maximum temperature. MAEs showed a significant declining trend over time for April (maximum temperature), July (minimum temperature, maximum temperature), and October (maximum temperature), as shown in Table 3, suggesting an improvement in the quality of predictions over time. Temperature MEs were not significantly related to year (Table 4).    www.nature.com/scientificdata www.nature.com/scientificdata/ not significantly correlated with year (Table 4), with the exception of July MEs which showed a significant declining trend over time. Cross-validation estimates and recorded values for the 160 test stations (every five years from 1905 to 2015) can be obtained from MacDonald 13 .
McKenney et al. 4 assessed the 2006 models using a representative withheld sample of between 100 and 200 stations (increasing over the course of the century), which were not used in the creation of the spatial models. By comparison, the current study compared CV estimates to recorded values for 160 stations selected to be more representative than the full sample. We used a leave-one-out approach as opposed to withholding a set of stations simultaneously. We therefore urge some caution in directly comparing the MEs and MAEs between the two studies due to these methodological differences.

Usage Notes
The monthly historical spatial models presented in this manuscript will be of interest to researchers and practitioners that need historical estimates of temperature or precipitation variables for points or regions in North America. These temperature and precipitation estimates are central inputs to species richness 27 , plant hardiness 28 , forest productivity 29 , forest cover change 30 , carbon 31,32 , water budget 33 , and species distribution models 34 , as well as in determining representativeness of different locations for conservation research 35 . Further, models based on gauge data are also used as inputs to satellite-based precipitation estimates 36 .
We note there are regional limitations associated with spatial models, especially for precipitation-related variables. Model predictions of precipitation along the coast were associated with larger errors, suggesting that a distance to coast independent variable might improve these estimates. For applications that exclusively include coastal areas of North America, more specialized gridded products may be more appropriate. However we note the paucity of station observations in some regions to develop/calibrate such models is especially problematic in Canada.

Code availability
SAS code used for data preparation and analysis has been published at Open Science Framework under the same name as the publication (https://doi.org/10.17605/OSF.IO/2DAK5) 37 . SAS code and output for the residual analysis can be accessed from this DOI. Table 3. Trend in MAEs (January, April, July, and October for the 160 test stations every five years from 1905, 1910…. 2015). Values shown are the Pearson correlation coefficient (r) indicating the correlation between MAE and year. Starred values indicate statistically significant results at the 0.01 level. Temperature MAEs are expressed in °C and precipitation MAEs are expressed as a percentage of total monthly recorded precipitation.