The continental runoff data set covers the period 1875–1994 and the start and end dates of runoff observations are listed2 for each of 221 constituent stations. A wavelet-based runoff 'reconstruction' methodology2 is used to infill any missing data and extend records forwards (to 1994) and backwards (to 1875) where necessary. In addition to previous concerns3 about the use of runoff stations subject to confounding anthropogenic influences, such as reservoirs, and aspects of the runoff reconstruction methodology, it seems that the runoff reconstruction methodology is based on the best correlation against one of ten unspecified reference stations2. There is no discussion about whether these reference stations are spatially and temporally representative of the full range of runoff regimes being reconstructed around the world and not subject to confounding anthropogenic influences. The sensitivity of the final reconstructed runoff to the choice of these reference stations is not properly addressed2,4.

A fundamental requirement of a runoff data set used in an attribution study, such as that by Gedney et al.1, is that it is representative of the observed runoff conditions around the world. The concerns outlined above and previously3 about the reconstruction methodology call into question the degree to which the runoff data set used2 is representative of those conditions. Key to this is a discussion of the percentage of continental runoff that is reconstructed (infilled or extrapolated), which is not presented1,2,4. Based on the start and end dates of the runoff records2, extrapolation from 10–20 years of observed runoff data to the centennial scale4 has been applied to at least 34% of the stations. This is likely to be an underestimate, because not all stations have complete records between these dates.

Gedney et al.1 analyse continental runoff records based on “observations from at least 20% of the total river basin area”, which could be taken to mean that up to 80% of the continental runoff is reconstructed, not observed, for some periods of their analysis. Without knowing the degree to which the runoff data set is reconstructed and therefore representative, conclusions based on replication of that runoff data in a modelling (attribution, for example) analysis are speculative.

Considering the serious concerns about the continental runoff data set underlying the results of Gedney et al.1, their conclusion that an increase in twentieth-century continental runoff is attributable to the suppression of plant transpiration by CO2-induced stomatal closure is called into question. This highlights the need for the development of a quality-controlled, freely available, global runoff data set, which can be used with confidence (or at least informed caution) in future studies.