Introduction

The Chernobyl disaster on 26 April 1986 is considered the worst nuclear accident in history. It is one of only two events ever to be classified as “Level 7 (major accident)”, the highest possible level on the International Nuclear and Radiological Event Scale – a classification system used by the International Atomic Energy Agency (IAEA) to communicate safety information related to nuclear accidents. The only other “Level 7” event, the Fukushima Daiichi nuclear disaster, took place in 2011 in Japan as a consequence of the Tohoku earthquake and tsunami. The Chernobyl explosion and subsequent nuclear fire, which burned for ten days, led to the release of between 9.35 × 103 and 1.25 × 104 petabecquerel (PBq) of radionuclides into the atmosphere1. The radioactive contaminants released by the Chernobyl accident were subsequently deposited in surrounding areas of Belarus, Russia and Ukraine, but also elsewhere across Europe2 and even Asia and North America. The pattern of contamination is heterogeneous, owing to atmospheric conditions at the time of the accident (Fig. 1). Although many radionuclides, such as 131Iodine, either dissipated or decayed within days, 137Caesium (137Cs), 90Strontium (90Sr) and 239Plutonium (239Pu and other trans-uranium-elements) and their decay products (e.g. 241Americium) still persist in the environment even hundreds of kilometers from Chernobyl. Even 20 years after the accident, the amount of radioactive material remaining is enormous. Given the half-lives of 137Cs, 90Sr and 239Pu (30, 29 and 24,000 years, respectively), they are likely to have a significant impact over a long time period1.

Figure 1
figure 1

Distribution of radioactive contamination in the Chernobyl region.

Adapted with permission from ref. 2.

Because of the unprecedented scale and global impact of the accident, it is no surprise that it generated significant interest in both the scientific community and the general public. As a result, numerous studies have been conducted to assess the consequences of Chernobyl on human health, agriculture and its biological effects ranging from the level of DNA to entire ecosystems. There have also been several qualitative reviews attempting to summarize the findings. Some of these, notably the official UN reports3,4, paint an optimistic picture, stating that the consequences for human health and the environment are much smaller than expected; a theme since echoed and interpreted in the popular and scientific press5,6. These conclusions were disputed by other reviews1,7,8, which question the UN reports' methodology and claim that its optimism is unfounded. However and surprisingly given the controversy surrounding this subject, to date there have been no attempts to use rigorous quantitative methods to summarize the entire literature and reach empirically based conclusions.

In areas of public health concern such as toxicology, obesity and others, general quantitative assessment of scientific findings has been performed using meta-analysis (MA) for several decades because this method is rigorous, consistent and inclusive [ref. 9; see Methods for worked examples of estimation of effect size]. In the last 20 years, MA has also emerged as a preferred method for research synthesis in ecology and evolution, especially since enormous amounts of published and unpublished research have made the traditional narrative approach to research synthesis increasingly unfeasible10. MA has frequently been used in assessing effects of environmental perturbations, such as increased CO2 emissions, climate change, natural variation in radiation, as well as different types of contamination caused by humans11. Unlike traditional qualitative reviews, MA allows powerful quantitative assessment of the magnitude of effects because its high degree of objectivity is based on a standardized and repeatable set of statistical procedures. The specific procedures used in MA are mostly analogous to standard statistical methods, but the units of analysis are the results of independent studies rather than the independent responses of individual subjects12. Therefore, we believe that such an analysis in the study of low-dose radiation would be very helpful in addressing the problems of selective data interpretation and reviewer bias, which often affect narrative reviews9,12. Since ecological meta-analyses inevitably involve synthesis of studies measured on different spatio-temporal scales, inclusion of heterogenic data in the analyses is inevitable13, requiring additional exploration of the effects of such heterogeneity in almost all cases.

We reviewed and quantified the effects of radiation from Chernobyl on mutation rates across all taxa. Since ionizing radiation has been well established as a mutagen14,15, we predicted that increased exposure to radiation resulting from the accident should have a strong effect on mutation rates. A second objective was to assess whether there were differences in response to radiation among taxa, including the sensitivity of humans relative to other taxa and whether such differences could be explained by ecology or life history.

Results

Mean effect size across all studies was 0.665 (Fig. 2; 95% confidence intervals (CI) 0.585 to 0.733, N = 151). Thus, radiation effects accounted for 44.3% of the variance in the study sample. Variation in individual effect sizes was no larger than expected due to sampling error (QT = 136.54, df = 149, p = 0.759). Fail-safe numbers indicating the number of null results required to eliminate this mean effect suggested that the overall effect size estimate was highly robust (Rosenberg's method: 4135 at p = 0.05), because it is highly unlikely that so many unpublished null results exist. When the analysis was repeated using a single effect size estimate for each species to account for non-independence of radiation response measurements within species, the overall effect size remained qualitatively similar 0.727 (95% CI 0.549 to 0.842, N = 30 species), with radiation effects accounting for 52.8% of the sample variance. Fail-safe numbers for this second analysis were 1644 using Rosenberg's method. Compared to the mean effect sizes across all meta-analyses in biology, as reviewed by Møller & Jennions16 (E = 0.205; 95% CI 0.158–0.251), our mean effect size is significantly larger (Fig. 3; t = 15.367, df = 21, p < 0.00001), implying this mean estimate is located in the extreme tail of the frequency distribution of mean effect sizes across all meta-analyses in biology17.

Figure 2
figure 2

Plot of the 151 effect sizes of the relationship between mutation rates and radiation from Chernobyl, ordered by increasing effect.

Effect sizes are z-transformed Pearson product-moment correlation coefficient estimates, shown with the 95% confidence intervals. The vertical line represents an effect size of zero.

Figure 3
figure 3

Frequency distribution of effect sizes from meta-analyses in biology (from ref. 24) and the effect size from the present study shown in dark green.

The methods used to identify mutations had a significant effect on reported effect sizes (Qb = 13.3954, df = 2, p = 0.001). The smallest mean effect size (E) of 0.329 (95% CI 0.040 to 0.567, N = 36) was found in studies where molecular analyses were used to identify mutations. Thus, radiation effects accounted for 10.8% of the variance in the study sample. For studies using cytogenetic methods to identify mutations, mean effect size was 0.699 (95% CI 0.572 to 0.794, N = 69), with radiation accounting for 48.9% of the sample variance. The largest effect size, 0.792 (95% CI 0.651 to 0.880, N = 36), was found in studies using phenotypic observations to determine mutation rates, which means that radiation effects account for 62.7% of the sample variance. However, when those studies were conservatively excluded from summary analyses, the overall mean effect size was only slightly reduced compared to the initial estimate based on the full range of studies (E = 0.591, 95% CI 0.513 to 0.659).

We found no significant evidence of publication bias regardless of whether the rank correlation for estimating bias was performed between standardized effect size estimates and their variance (rbias = −0.018, N = 151, p = 0.83), between effect size estimates and study sample size (rbias = 0.015, N = 151, p = 0.86), or between effect size and year of publication (rbias = −0.113, N = 151, p = 0.17).

Taxon-specific effect sizes are listed in Table 1. Mean effect size was significantly larger for plants than animals (Qb = 5.044, df = 1, p = 0.025). Among animals, overall effect size for mammals was 0.598 (95% CI 0.480 to 0.695), which is indistinguishable from the overall effect size for all animals. This is unsurprising, given that the majority of animal studies were done on mammals, which account for 70 out of 81 data points. Among plants, cereal crops, comprising 10 of 68 data points, show an effect size of 0.624 (95% CI 0.454 to 0.749).

Table 1 Mean effect size and confidence intervals for different taxonomic groups based on different sample sizes (N). A separate unstructured, random-effects model meta-analysis was performed for each group to estimate mean effect size, 95% confidence intervals and Rosenthal's fail-safe number

Mean effect sizes for individual species for which more than one data point was obtained are listed in Fig. 4a, along with the bootstrap confidence intervals for the estimate. There is large and highly significant variation among species, with humans located in the middle – close to the median – with respect to effect size. Fig. 4b shows a dendrogram based on similarity in effect size for all species included in the study, constructed using Ward's minimum variance method18. Three distinct clusters are apparent and the relationships between effect sizes are not due to similarity caused by common phylogenetic descent. All three clusters contain both plants and animals.

Figure 4
figure 4

Species-specific effect sizes of radiation on mutation rates differ widely among taxa.

(a) Mean effect size of radiation on mutation rates for individual species, with bootstrap CI of the estimate (10,000 iterations). Number of data points available for each species is indicated next to the species name. (b) Dendrogram based on effect size similarity for all species included in the study, constructed using Ward's minimum variance method. Sample size for each species is shown in parentheses.

We found a significant negative effect of the number of generations since the accident on effect sizes (r = −0.259, SE = 0.081, F1,92 = 10.136, p = 0.002). However, when separate analyses were carried out for plants and animals, this effect disappeared. For animals a very small but statistically significant positive effect was observed (r = 0.147, SE = 0.070, F1,58 = 4.426, p = 0.04), whereas for plants the model was not statistically significant (r = −0.211, SE = 0.325, F1,33 = 0.421, p = 0.52).

There was no significant difference in effect size between single-year and multiple-year studies (Qb = 0.881, df = 1, p = 0.35). The number of populations in the study had a small, but statistically significant effect on mean effect size (r = −0.031, p < 0.0001). However, due to its small magnitude, adjusting for this effect did not change the overall effect size estimate (E = 0.747, 95% CI 0.745 to 0.749), nor did repeating the summary analyses while excluding all studies with only two populations (E = 0.704, 95% CI 0.619 to 0.773).

Discussion

The magnitude of the biological effect revealed by our meta-analysis of radiation-caused mutations can be classified as unusually large. According to Cohen19, effects can be considered small, medium or large based on the proportion of variance in the sample that they explain (1%, 9%, or 25%, respectively). On average, the proportion of variance in main effects explained by biological meta-analyses is 5–10% and for many published studies it is much lower [24]. Therefore, our findings that radiation effects account for 44% of the variance in the sample and 53% if the effect sizes for each species are collapsed into a single mean effect, indicate an unusually strong effect. When separating studies according to the method used to identify mutations, we identified a medium sized effect according to Cohen's classification for molecular studies and large effect sizes for both cytogenetic and phenotypic observations. Among biological meta-analyses, even the effect size for molecular studies can be considered large.

Furthermore, for all higher taxonomic groupings and 17 out of 19 species for which the sample size was sufficient to calculate mean effect sizes, the sign of the effect was positive, indicating an increased mutation rate as a consequence of radiation exposure. These numbers are significantly different from the random expectation of half of the effects being negative and half positive (G = 11.54, df = 1, p < 0.001). Clearly a broader taxonomic spread of estimates would have broadened the scope for reaching more general conclusions, but given the difficulty of obtaining samples and the overall low level or complete absence of funding for many of the studies, such efforts will require a concerted effort. We also note that the 19 species cover all the major taxa including bacteria, plants and animals. Interspecific differences in effect size could be explained by differences in resistance to radiation due to physiological mechanisms, such as DNA repair and other factors such as life histories and behavioral differences. Part of the differences among higher taxonomic groups may likely also be attributed to physiological factors, while differences between plants and animals could partly be due to the sedentary nature of plants, which unlike animals are unable to temporarily or permanently move away from the most contaminated areas, or differences in genome size and ploidy.

Studies of the effects of radiation on mutations are heterogeneous in terms of sample size, methods and sample material. Any attempt to homogenize such effects is bound to fail, if for no other reasons then for logistic reasons. We addressed the effects of such methodological issues in the statistical analyses, but still reached a qualitatively similar conclusion. When testing whether study design had an impact on the observed effect size, we found that “higher quality” studies, i.e. those conducted over multiple years and testing for effects of radiation on multiple populations, did not produce a significantly different result from the “lower quality” studies. This is perhaps not surprising given the magnitude of the overall effect size, which makes it more likely that an effect would be detectable even by studies with design constraints. Moreover, our analysis shows that it is unlikely that the large overall effect is a consequence of a specific systematic error in study design.

We identified clear and significant interspecific differences in effect size. The effect sizes covered almost the full range between 0 (no effect) and 1 (the largest possible effect size). However, the phylogenetic signal in those differences was very weak, i.e. the observed effect size values matched up poorly with the phylogenetic position of species. This result cautions against inferring susceptibility or mutational response to radiation for a given species solely based on the known responses of a related species. Factors such as DNA repair mechanisms, genome size, ploidy and life history could have an impact on how a given species responds in terms of mutations to elevated levels of ionizing radiation. Because radiation effects on humans are of particular interest for public health reasons, we also tested whether humans responded differently from other organisms in terms of effects of radiation on mutation rates. Humans demonstrated an intermediate susceptibility to radiation compared to other species.

Given that the studies covered a time period spanning 25 years, it was possible to test whether effect sizes remained constant over time. Across all taxa, we found a moderately negative effect of the number of generations since the accident on observed effect sizes. However, when the model is adjusted to account for different taxonomic groups included in the analysis, significance is lost and in the case of animals, the direction of the effect is reversed. It is therefore likely that the reduction in effect sizes with increasing number of generations since the accident is an artifact of heterogeneity in the data, rather than a real biological phenomenon. Hence there is no evidence of a decrease in the effects of Chernobyl radiation on mutations over time. Given the relatively long half-lives of major radioactive contaminants in the area, this is not surprising.

As with any other type of review, one of the most important problems in meta-analysis is study unavailability, also known in its more extreme form as the “file drawer” problem. The studies retrieved and used in the analysis may not be fully representative of all studies performed on a given subject. Therefore, it is conceivable that if those “missing” studies had been included, the review's conclusions would have been different. This is especially true when only published studies are used for reviews, as the probability of publication of a given study often depends on the statistical significance of its results12. However, unlike other types of literature reviews, meta-analysis includes methods for addressing this problem. Since it can be used to calculate both the overall numerical value for the effect size and its statistical significance, we can also calculate the number of studies, not included in the analysis and showing an average zero effect size, that would be required to reduce the statistical significance of the findings below the P = 0.05 threshold19. In our meta-analysis, these fail-safe numbers indicate that the estimates of effect size, given the number of studies and their sample sizes, are highly reliable and robust. It would require 4135 studies with a mean effect size of zero to reduce the statistical significance of our analysis to P > 0.05; the existence of so many unpublished studies is highly unlikely given the 152 published data points actually included in this meta-analysis. For the analysis using only one composite estimate per species, the fail-safe numbers again indicate that the result is very robust, when Rosenberg's (fail-safe number = 1644) method is compared to the 30 species used for the analysis.

We present the first quantitative review of the literature on biological consequences of the Chernobyl disaster. A meta-analytical approach to examine the relationship between radiation and mutation rates revealed a surprisingly large overall effect size, across all studies and taxa, with the mean effect size exceeding almost all other mean effect sizes reported in the biological sciences. This means that mutation rates increased strongly in contaminated areas compared to control sites with normal background radiation. Furthermore, there were significant differences in mean effect size among taxa, including interspecific differences. Finally, indirect tests for publication bias indicated a low likelihood of the existence of bias and fail-safe tests suggested that the overall effect size estimate is highly reliable.

Our study reaffirms that mutation rates differ widely even among closely related taxa, although based on our study it remains largely unclear why species differ in their resistance to radiation. The surprisingly high mean effect size suggests a strong impact of radioactive contamination on individual fitness, as well as potentially significant population-level consequences, even beyond the area contaminated with radioactive material. This study constitutes only the first step in quantitatively examining the effects of the Chernobyl disaster and further application of meta-analysis to the numerous studies especially in non-English languages available on other subjects should lead to interesting insights into this important and occasionally controversial subject. Especially in light of recent events at the Fukushima Daiichi nuclear power plant in Japan, this approach may prove to be of practical value in predicting and mitigating the consequence of radioactive contamination. Future studies of the effects of radiation on mutation rates for the same species in Chernobyl and Fukushima would allow for tests of consistency across taxonomic and geographic scales.

Methods

Data were collected from published original studies that estimated the impact of radiation around Chernobyl on mutation rates. Initial data collection was performed using the keyword search on ISI Web of Knowledge (using the keyword “Chernobyl” in combination with keywords “mutations”, “mutation rate”, “abnormalities”, “radiation effects”), with each keyword search being performed independently to maximize reach. Studies in Russian and Ukrainian not accessible through Web of Knowledge were identified and obtained through references in the literature and direct inquiries to colleagues. From the preliminary group of studies obtained, studies not meeting the following inclusion criteria were eliminated: (i) studies examining the effects of radioactive contamination around Chernobyl; (ii) studies reporting at least one statistical test explicitly examining the relationship between radiation level and some measure of mutation rates; and (iii) studies from which those data could be extracted and converted to correlation coefficients as a measure of effect size. We used 26 April 2011, the 25th anniversary of the accident, as the cut-off date for finalizing data search and collection. In total 45 studies (see Supplementary Appendix I), covering 30 species, were found matching the search criteria, yielding 151 data points for the meta-analysis (Supplementary Appendix II). In the case of foreign language literature in Russian and Ukrainian, translations were obtained from colleagues in Ukraine. A typical original study used in this meta-analysis examined the differences in mutation rate across two or more populations in the areas around Chernobyl varying in their level of background radiation. The results of each reported statistical test (t-test, ANOVA, chi square, etc.) on the relationship between radiation level and mutation rates represented an individual data point for meta-analysis. These test statistics were first converted to z-transformed correlation coefficients (zr), also known as Fisher's z, following Rosenthal20. A more conservative approach would have been to use non-parametric Spearman rank order correlations rather than Pearson product-moment correlations. However, since many studies did not report the individual observations, we would have been forced to exclude numerous studies if we had followed this approach. Based on zr and study size, an effect size was calculated for each study using the MetaWin 2.1 statistical software package for meta-analysis21. The same software was then used to calculate the mean effect size across all the studies using an unstructured, random-effects model22. Confidence intervals for the mean effect size estimates were determined by taking individual effect sizes, weighting them according to sample size and performing resampling tests with 10,000 iterations. Random effects models account for the fact that there is a true random component of variation in effect sizes among studies, in addition to sampling error. Given the broad scope and the range of taxa that this meta-analysis covers, this is a necessary and likely assumption. In case of a high level of effect size repeatability within species, individual data points for each species cannot be considered independent. In order to address this potential problem, the analysis was also repeated using a single mean estimated per species weighted by sample size. To assess the robustness of the findings and address the “file drawer” problem, which states that null results showing weak effects are disproportionately likely to remain hidden in file drawers, fail-safe calculations were performed using Rosenberg's21 method. Indirect tests for publication bias in the studies used for the meta-analysis were made using rank correlation between effect size estimates and their variances, between effect size and sample size and between effect size and year of publication23,24,25.

Since different methods used to identify mutations may have different sensitivities and constraints, it is conceivable that study results could differ when different methods are used. To test whether the method used to identify mutations had an impact on reported effect sizes, the data set was divided into three categories with respect to this factor: molecular analyses of changes in genes, cytogenetic analyses and indirect methods based on phenotypic consequences of mutations such as albinism, foetal mortality and pollen abortion. A random-effects categorical meta-analysis22 was then used to calculate mean effect sizes for each of those categories and compare those means.

We also expected to observe differences in effect sizes based on time elapsed between the accident and when a given study was conducted, due to decay of radionuclides over time. We used a linear model to test for this effect, adjusting for taxon by taking into account generation time for each species and weighing effect sizes by study size. The number of generations elapsed between the accident and the beginning of the respective studies, or mean time in case of multi-year studies, was calculated by dividing this time period by the average generation time of each species. “Number of generations since the accident” was then logarithmically transformed and incorporated into the model as an explanatory variable.

Since studies differ not only in their sample sizes, but also in experimental design, it was necessary to test whether key features of experimental design had an impact on effect sizes. Two characteristics of particular interest were whether a given study was conducted during a single year or multiple years and how many independent populations were used to estimate radiation effects. These characteristics can be viewed as indicators of study “quality”, which may be important but is otherwise difficult to quantify. To test for effects of study period, which is a categorical variable, a random-effects categorical model was used. For the effects of the number of populations included, a random-effects continuous model was used22. Due to sample size constraints, univariate models were used in all cases and we were therefore unable to test for interaction effects of generation time, study quality, radiation dosage and other variables.

Meta-analysis procedures

We followed standard procedures for extracting effect sizes, which are all reported in Supplementary Table 2. We briefly illustrate how one such effect size was extracted. Ellegren et al.26 p. 594 reported for the years 1991 and 1996 based on 141 individuals a Pearson product-moment correlation coefficient of 0.1674 as the effect size. This study of birds, specifically the barn swallow Hirundo rustica, was based on micro-satellites in a genetics study of mutations. The range of radiation was low according to our categorisation. The test statistic was a χ2 value of 3.95 with 1 df in a study of 141 individuals. In this case Pearson r = √ (χ2/N) according to ref. 27. Thus r = √ (3.95/141), which equals 0.1674. This study was based on a blind test with a positive relationship between mutation rate and radiation and the duration of the study was medium. We used similar procedures to extract the other 171 effect sizes.