Fu et al.6 linearly combine MSU channels 2 and 4 using coefficients estimated from linear regression on a single, monthly mean radiosonde data set to create an effective weighting function that minimizes the effect of the stratosphere. For this approach to be valid on all space and time scales, the structure of stratospheric temperature variability must be stationary; this is not the case in reality. For example, the quasi-biennial oscillation7 has a temperature response of more than 1 K above pressures of 100 hPa, where the weightings of Fu et al.6 are negative, but little signal below 100 hPa, where the weightings are positive. Fu et al.6 will therefore alias an inverse quasi-biennial oscillation signal into the tropical tropospheric record — something not apparent in radiosonde observations.

Fu et al. trained and tested both their channel-2 and -4 coefficients on the same radiosonde data5, which can give false agreement and overfitting8. They found a global-average trend difference between their estimated value and actual 850–300-hPa temperatures (T850–300) of 0.001 K per decade. We believe that this result is misleading: their statistical model could have been independently confirmed by at least one other vertically resolved radiosonde data set4, a reanalysis9, a climate model forced with observed sea surface temperatures and anthropogenic and natural forcings10 or a coupled climate model forced with anthropogenic and natural forcings11. We did this for tropical trends (Table 1). Discrepancies between tropical trends computed using the method of Fu et al.6 (Tfjws) and tropical T850–300 trends range from −0.02 to 0.06 K per decade, with root-mean-square values ranging from 0.03 to 0.09 K. Except for HadRT2.1s, the T2LT trend (where T2LT is a synthetic channel for lower-middle troposphere) is a better estimate of the T850–300 trends than Tfjws trends. These Tfjws trends are generally larger than the T850–300 trends, suggesting that the approach of Fu et al. has a warm bias.

Table 1 Tropical trends for deep-layer temperatures for 1 December 1978 to 1 December 2002

Trend discrepancies, and root-mean-square values, are smaller when Tfjws is compared with 1,000–100-hPa temperatures (T1,000–100), with less evidence of systematic bias (Table 1). This is probably because of the form of the effective weighting function. However, there still exist differences of −0.01 to 0.02 K per decade between T1,000–100 and Tfjws trends — about 10% of the observed surface tropical warming.

Average tropospheric temperature trends derived from an ensemble of coupled atmosphere–ocean model simulations are similar to those in the atmosphere-only case (Table 1). However, tropospheric trend ranges in the coupled simulations are larger than the atmosphere-only case and so are consistent with that estimated from one processing of the MSU record3. This demonstrates that ignoring observed changes in sea surface temperature leads to a weaker test of model–data consistency.

We re-estimated the channel-2 and -4 coefficients of Fu et al.6 using HadRT2.1s (ref. 4), rather than the radiosonde data set of ref. 5. Our coefficients differ from those of Fu et al. and are sensitive to the choice of training period, with a total uncertainty of the order of 10% for global and tropical coefficients, corresponding to a trend uncertainty of 0.01 to 0.02 K per decade.

Although the approach of Fu et al. is novel, independent data indicate that it contains significant uncertainty. HadAM3 and the GISS model12 forced with observed sea surface temperatures and forcing reconstructions show significantly greater warming than all ‘observed’ data sets. To resolve differences between models and observations requires good experimental design and process-based studies using physical understanding.