Sampling Biases in Datasets of Historical Mean Air Temperature over Land

Global mean surface air temperature (Ta) has been reported to have risen by 0.74°C over the last 100 years. However, the definition of mean Ta is still a subject of debate. The most defensible definition might be the integral of the continuous temperature measurements over a day (Td0). However, for technological and historical reasons, mean Ta over land have been taken to be the average of the daily maximum and minimum temperature measurements (Td1). All existing principal global temperature analyses over land rely heavily on Td1. Here, I make a first quantitative assessment of the bias in the use of Td1 to estimate trends of mean Ta using hourly Ta observations at 5600 globally distributed weather stations from the 1970s to 2013. I find that the use of Td1 has a negligible impact on the global mean warming rate. However, the trend of Td1 has a substantial bias at regional and local scales, with a root mean square error of over 25% at 5° × 5° grids. Therefore, caution should be taken when using mean Ta datasets based on Td1 to examine high resolution details of warming trends.

T he measurements of the daily maximum and minimum temperatures (T max and T min ) were developed in English-speaking countries once the maximum/minimum thermometers became widely used in approximately 1860 1 . The maximum (or minimum) thermometer is a unique thermometer in that its reading does not change after the air temperature (T a ) reaches T max (or T min ). Therefore, T max and T min can be easily measured by checking and resetting the thermometers once a day. The measurements of T max and T min have been accepted globally, and T d1 5 (T min 1 T max )/2 has already become the most common way to calculate mean T a 2 . In many weather stations, the measurements of T max and T min may be the only data source for historical mean T a and the only choice for a homogenous century-long analysis of mean T a [2][3][4] . Analyses of global mean T a and its changes are performed operationally by several groups, including the NOAA National Climatic Data Center (NCDC) Global Historical Climatology Network (GHCN) [5][6][7] , the Goddard Institute for Space Studies (GISS) 8 , and a joint effort between the Met Office Hadley Center and the University of East Anglia Climate Research Unit (CRUTEM4) 9,10 . All of the global temperature analyses over land performed by the aforementioned groups rely heavily on T d1 11,12 . These century-duration analyses have provided basic datasets for global and regional climate change detection and attribution 13 .
However, the use of T min and T max to calculate mean T a has been criticised because T min is the response of a shallow nocturnal stable boundary layer 12,14 , and its variation is sensitive to local land use/land cover 15 , surface wind speed and humidity, and downward long-wave radiation from greenhouse gases 16 . T a reaches T min in the early morning because of long-wave cooling and reaches T max in the early afternoon because of solar short-wave radiation heating (Fig. 1). However, T a does not change linearly and symmetrically with time ( Fig. 1); its diurnal curve depends on the proportion of surface absorbed energy partitioned into sensible and latent heat fluxes (or evapotranspiration) 17 , which is determined by the surface incident solar radiation, atmospheric downward longwave radiation 16 , and surface aridity 18 . Therefore, T d1 may deviate from real daily mean T a (i.e., T d0 ), for both climatology and long-term trends.
In this study, I make a first quantitative assessment of the bias of T d1 in quantifying trends of mean T a using hourly T a observations collected by the NCDC Integrated Surface Database (ISD) 19 , which has provided longterm hourly T a measurements at approximately 5600 globally distributed weather stations since the 1970s (see Figs. S1 and S2 for detailed information). a day without monitoring. The former introduces systematic bias to T d1 , and the later primarily introduces random bias. These two sources of bias are evaluated separately in Figs. 2 and 3, respectively.
The 24-hour T a from multi-year observations at each station, as shown for example in Fig. 1, are comprised of hourly T a observations. T d0 and T d1 are calculated from these composite values. Their differences therefore reflect the impact of the shape of the diurnal variation of T a on the climatology of T d0 and T d1 . The mean surface air temperature T d1 calculated from T max and T min has a substantially different climatology from that of the real mean surface air temperature T d0 (Fig. 2), with a root mean square error of T d1 2 T d0 of approximately 0.3 uC for global land measurements (Table 1). These differences in the climatologies between T d1 and T d0 are related to the asymmetric diurnal variation of T a (Fig. 1), which is determined by the surface energy balance. During cold seasons, T d1 overestimates the mean surface air temperature T d0 almost everywhere. This overestimation is higher in arid or semi-arid regions.
Two factors explain why T d1 2 T d0 is much higher in cold seasons than in warm seasons in arid/semi-arid regions. First, in early morning, a larger fraction of the surface absorbed energy is partitioned into sensible heat flux during the cold seasons 17 because the surfaces are covered by less vegetation and because it is drier in cold seasons than in warm seasons. These conditions result in a faster increase of T a in the morning under drier conditions because the sensible heat flux directly heats the air above the surface 18 . Second, in the afternoon, the surface long-wave cooling effect is more efficient (Fig. 1) in cold seasons because the compensating effect of atmospheric downward long-wave radiation is lower in the cold seasons due to their lower relative humidity 16 . The combination of these two factors results in higher values of T d1 2 T d0 during the dry cold seasons in arid/semiarid regions.
T d1 samples T a only twice, during the early morning and the early afternoon, leaving approximately two-thirds of the day unmonitored. The unmonitored times may miss important information about the impact of weather events, such as fronts, on T a that are important during the cold seasons in the northern high latitudes. The deviation of T d1 from T d0 may be strong, but it is likely to be randomly distributed. The deviation may be quantified by calculating the standard deviation of the daily T d1 2 T d0 . Fig. 3 confirms that the standard deviations of the daily T d1 2 T d0 are higher at the northern high latitudes in the cold seasons and in the global arid/semi-arid regions in both cold and warm seasons. In general, the standard deviations of the daily T d1 2 T d0 in the cold seasons are slightly larger than those in the warm seasons, with a mean value of approximately 0.6 uC (Table 1).
Figs. 2 and 3 show that the deviation of T d1 from T d0 depends on the asymmetric diurnal variation of T a , which is related to (a) the surface aridity and vegetation coverage and (b) the timing and frequency of weather events, i.e., front activity. Both aspects may change significantly under climate change conditions [20][21][22] , which may introduce a substantial bias to the trend of the mean surface air temperature through the use of T d1 . Fig. 2 implies that the bias of T d1 in quantifying the trend of the mean air temperature may be substantial in arid or semi-arid regions if the surface aridity or vegetation coverage changes. I, therefore, further assess the differences in the trends of T d1 versus T d0 using the NCDC ISD data. Fig. 4 shows the relative biases of the trends, and the statistical results are presented in Table 1. T d1 has a negligible impact on the long-term trends of the mean surface air temperature, i.e., the global mean warming rate ( Table 1).
The trends calculated from the monthly anomalies of T d1 , however, have significant biases relative to T d0 at the regional or local scales, with a root mean square error higher than 25% at a 5u 3 5u grid scale ( Table 1). The spatial coherence of biases in trends of the mean air temperature shown in Fig. 4 is minimal, in contrast with the results in Figs. 2 and 3. Three possible reasons may account for this minimal coherence: (1) the variable changes of the surface conditions occur in both direction and amount at different grids, (2) the measurement bias of the hourly T a that were used to calculate T max , T min , Because of long-wave cooling, T a reaches its daily minimum, T min , in the early morning. The temperature reaches its daily maximum T max in the early afternoon because of solar radiation heating. T a , however, does not change linearly with time. During the cold seasons ( Fig. 1b), T a reaches T max in the early afternoon but decreases quickly afterwards because surfaces receive less incident solar and long-wave radiation. Because of this asymmetric diurnal variation of T a , T d1 5 (T max 1 T min )/2 is 0.71 uC higher than T d0 , integrated from the hourly temperature observations. However, T d1 is almost equal to T d0 during the warm seasons. The figure was produced using MATLAB.

Discussion
In this study, the maximum and minimum hourly T a measurements are selected to represent T max and T min . However, the response time of the T max and T min thermometer is several minutes. This may introduce some bias to the conclusions of this study. Here, I use the T a data of five-minute resolution collected by the U.S. Climate Reference Network (USCRN) to demonstrate that it is reliable to use hourly T a measurements to represent T max and T min .
USCRN has operated since the year 2000 at a few sites and was officially and nationally commissioned by the National Oceanic and Atmospheric Administration (NOAA) in 2004 23,24 . The primary goal of USCRN is to provide long-term high-quality homogeneous observations, in particular, for T a and precipitation 25 ; its temperature system consists of a platinum resistance thermometer and an aspirated radiation shield 24,25 . At each site, the USCRN uses three such thermometers for inter-comparison and quality assurance 25 . These redundant high-level thermometers at each USCRN site guarantee their high-quality continuous observations of T a (the missing rate of its T a data is near zero). The data sampling rate of the USCRN thermometers is 5 seconds, and temperature signals averaged over 5-minute intervals are output.
USCRN provides a method for coupling its continuous measurements to the past observations 24,26 . T max and T min can be directly determined from its 5-minute averages 24,26 , from which T d1 is calculated. We processed all of the data in local solar time. Monthly averages are calculated only if the daily means are available on no less than 24 days in a month and if there are no gaps of four or more consecutive days. The trends of the T d0 and T d1 are calculated from the monthly anomalies after removing their seasonal cycles. Fig. 5 shows that the biases in the mean values of T d1 using the USCRN data are similar to those from the ISD, both in amount and spatial variability. The biases in the relative trend of T d1 using the USCRN data are of similar magnitude to those using the ISD data, but with different spatial pattern because the time durations of the two datasets are different. As data availability of the USCRN is temporally and spatially less than that of ISD, the main results in this study are reported using the ISD data.
What are the implications of the biases of T d1 and of its trend uncovered in this analysis? It was reported that the warming rate was stronger in the cold-season in semi-arid regions using observations based on T d1 27 . Observations of global mean temperature that are primarily based on T d1 , including CRUTEM3, GISS, and NCDC, were used to evaluate T d0 from the simulation of the Intergovernmental Panel on Climate Change Fourth Assessment Report (IPCC AR4) climate models. It was found that the model simulations of T d0 did not simulate the full extent of the observed  winter time warming of T d1 at the high-latitude Northern Hemisphere 28 . As shown in Fig. 2, T d1 will overestimate the trend of mean T a if the area gets drier in the arid/semi-arid regions.
Observed evidences show it is the case as middle latitude arid or semi-arid regions have been drier in recent decades 29,30 . This partly explains the enhanced warming in semi-arid regions in cold seasons. Although mean temperature, T d1 5 (T max 1 T min )/2, is perhaps the most common method used to calculate mean air temperature, it is the not the only option. Scandinavian countries developed a special formula to estimate mean T a 31 . For example, Sweden still uses the Ekholm-Modén formula from 1916, in which the mean T a is a linear combination of the T max , T min , and measurements of T a at 6, 12, and 18 h UTC 31 . Another option is to calculate the mean surface air temperature from T a measurements at 00, 06, 12 and 18 h UTC (T d2 ), which are recently available through the World Meteorological Organization's (WMO) global telecommunication system 32 . These data have been used in climate-related research 33 . The significant differences in the climatology of T d1 and T d0 (or T d2 ), as shown in Fig. 2 (or Fig. S3), preclude switching from the use of T d1 to T d0 (or T d2 ) for estimating the long term variability of the mean surface air temperature, although T d0 or T d2 is already globally available (Figs. 2 and S2). To produce a century-long homogenous dataset of mean T a , it is essential to continue to use the measurement methods currently used at the weather stations 3,4 . This study indicates that the use of T d1 has a negligible impact on the global mean warming rate. However, T d1 cannot accurately reflect the impact of  The data shown here are integrated into 5u 3 5u grids from approximately 5600 weather stations with different data periods for each station (see Fig. S2 for detailed information). T d1 5 (T max 1 T min )/2 is calculated from the daily maximum temperature (T max ) and minimum temperature (T min ), and T d0 is integrated from the hourly values at the NCDC ISD stations. The dots indicate that the trends of T d1 2 T d0 pass the a 5 0.05 Student's t-test, and the small pluses indicate that the trends do not pass the confidence test. The positive values indicate that T d1 overestimates the long-term trend of the mean surface air temperature. The figure was produced using MATLAB. the changes in the surface conditions on the variability of T a . These changing surface conditions cause the diurnal curve of T a to vary with time, thereby resulting in the deviation of T d1 from T d0 . Therefore, the trend of T d1 has a substantial bias at regional and local scales, with a root mean square error of over 25% for 5u 3 5u grids. Careful attention should be paid when using mean surface air temperature T d1 on studies regarding the spatial patterns of warming. For such studies, recently available hourly measurements 19 are recommended.
Factors that can introduce inhomogeneity to the mean T a have been reviewed by Jones et al. 34 . In particular, by assuming that a departure from differently derived mean values are comparable, Bradley and Jones 35 inferred that the monthly temperature values and the hemispherical estimates from different definitions were the same if calculated as anomalies from a selected period. This study confirms this assumption by demonstrating that the global mean trend calculated from the monthly anomalies of T d1 is the same as that of T d0 . However, the trends calculated from the monthly anomalies of T d1 at 5u 3 5u grids have a significant error, with a standard deviation of 25%. This large error is caused by the variation of T d1 2 T d0 with changing surface conditions (Fig. 2) as a result of the changing diurnal curve of T a (Fig. 1).
The use of T max and T min to calculate the mean T a over land is a result of the fact that many places have used inexpensive instruments that only measure those two temperatures (and could only be checked by an observer once a day) for a long time. T min is more sensitive to changes in land cover 15 and atmospheric downward longwave radiation 12 , while the diurnal temperature range (T max 2 T min ) is more sensitive to changes in land-atmosphere turbulent fluxes 17 , which are driven by variations in the surface incident solar radiation 18 caused by the changes in the clouds and aerosols 36,37 . Observations from T max and T min are therefore very important because of their relationship to climate change impacts and their connection to the global energy balance.

Methods
The hourly observations of T a over global land collected by the NCDC ISD project 19 were used in this study. The ISD compiles data from over 100 original data sources that archive hundreds of meteorological variables. The ISD has archived 2 billion surface weather observations from over 20,000 stations worldwide from 1900 to the present. Currently, the ISD database is updated with observations from over 11,000 active stations on a daily basis.
The ISD provides consistent and standardised quality control of the global hourly meteorological observations 19 . The ISD contains 54 quality control (QC) algorithms that serve to process each data observation through a series of validity checks, extreme value checks, internal consistency checks, and external continuity checks. Among all of the parameters, temperatures are among the most extensively validated parameters. The ISD data can be freely downloaded from www.ncdc.noaa.gov/oa/climate/ isd/index.php.
As of August 2013, there were approximately 5600 stations reporting hourly T a measurements for more than five years (Fig. S1). The ISD data were reported in UTC time and converted into local solar time for the purpose of my analysis. To obtain the bias and to reduce the impact of missing data, I produced a composite of the 24 hourly values at each site, i.e., all of the observations were averaged into hourly values, and the 24-hour values of the T a were obtained for each site. T min and T max were selected from the 24-hour values, from which T d1 was calculated. T d0 was integrated from the 24-hour values. In this study, I split a year into cold seasons (November to April in the Northern Hemisphere, or May to October in the Southern Hemisphere) and warm seasons (May to October in the Northern Hemisphere, or November to April in the Southern Hemisphere). The climatological differences of T d1 and T d0 were aggregated into a 5u 3 5u grid and are shown in Fig. 2. The composite method used here substantially reduces the impact of missing data on the results in Figs. 1 and 2. If there are no missing data, the results shown in Figs. 1 and 2 should be equal to those based on the daily basis, provided that the day is defined as being from midnight to midnight.
The trends shown in Fig. 4 were calculated differently. T min and T max were first selected from the 24-hour observations for each day at every site. The data were regarded as reliable only if the hourly temperatures were available for more than 22 h a day, from which the daily and monthly T d0 and T d1 were calculated. The monthly values were regarded as reliable only if the daily values were available for more than 15 days a month. The requirement for hourly air temperature measurements is stricter than that for daily values because this study focuses on the difference of T d0 and T d1 , which is dominated by the diurnal curve of air temperature. The monthly anomalies of T d1 and T d0 were calculated at each weather station by removing their averaged seasonal cycle. The monthly anomalies were then aggregated into 5u 3 5u grid values. The grid averaged-monthly anomalies were regarded as reliable if the data for each month was available at more than 50% of the stations within the grid. The trends of T d0 and T d1 2 T d0 calculated from the grid monthly anomalies are presented in Fig. 4 and Table 1. The data duration of T d1 and T d0 at the 5u 3 5u grid can be found in Fig.  S2.
The measurements of T min and T max were developed in English-speaking countries where the maximum/minimum thermometers were widely used since approximately 1860 1 . A maximum thermometer is a unique mercury thermometer that functions by having a constriction in the neck close to the bulb. The mercury is forced up through the constriction by the force of expansion as the temperature increases. When there is a decrease in the temperature, the volume of mercury contracts but cannot return to the bulb because of the narrow of the bulb neck. As a result, the column of mercury breaks at the constriction and remains stationary in the tube. The minimum thermometer works similarly, but it does so with steel pin immersed in clear liquid (i.e., ethyl alcohol) in glass.
The measurements of T max and T min can be made by one visit to the weather station a day. Because of its low cost, the measurements of T max and T min have been accepted globally, and T d1 5 (T min 1 T max )/2 has become the most common method to calculate the mean surface temperature. For most weather stations, the measurements using this method may be the only data source for historical temperature.
The weather stations of ISD directly measured T max and T min , in addition to the hourly temperature. The T max and T min temperatures were defined as the highest and lowest temperatures to have occurred during the past 24 hours. However, the T max or T min measurements may depend on the observation schedule, which may be different from country to country. The definition of a day is therefore different, such as from midnight to midnight or from noon to noon. In Europe, T min and T max are usually reported for 12-hour intervals ending at 6 UTC and 18 UTC and are not necessarily the true T min and T max in many regions, especially during the winter months 38 . This discrepancy occurs because 6 UTC is before the climatologically coldest hour sunrise in the winter in some regions (i.e., Europe) 33 and is also partly a result of the synoptic weather variability. This discrepancy introduces a significant error in the estimations of daily T max and T min 33 . The changes of the observation schedules may also introduce inhomogeneity of the climatology of the surface mean air temperature 38 . These problems can be avoided either by maintaining an unchanged observation schedule at a station or by using hourly observations, as I did here. The bias caused by changes to the observation schedules of T min and T max may be important 15,38 , but are not discussed here because the information on observation schedules is not yet publicly available. Two primary sources of bias of T d1 are discussed in this study (Figs. 2 and 3). They have physical meanings and may introduce substantial bias to trends of mean surface air temperature T d1 . The differences among the trends of T d1 and T d0 (Fig. 4), which are much less sensitive to the definition of the day, are calculated using the definition of day as midnight to midnight.