Nature | Comment

Policy: Twenty tips for interpreting scientific claims

This list will help non-scientists to interrogate advisers and to grasp the limitations of evidence, say William J. Sutherland, David Spiegelhalter and Mark A. Burgman.

Article tools

BADGER: ANDY ROUSE/NATURE PICTURE LIBRARY; NUCLEAR PLANT: MICHAEL KOHAUPT/FLICKR/GETTY; BEE: MICHAEL DURHAM/MINDEN/FLPA

Science and policy have collided on contentious issues such as bee declines, nuclear power and the role of badgers in bovine tuberculosis.

Calls for the closer integration of science in political decision-making have been commonplace for decades. However, there are serious problems in the application of science to policy — from energy to health and environment to education.

One suggestion to improve matters is to encourage more scientists to get involved in politics. Although laudable, it is unrealistic to expect substantially increased political involvement from scientists. Another proposal is to expand the role of chief scientific advisers1, increasing their number, availability and participation in political processes. Neither approach deals with the core problem of scientific ignorance among many who vote in parliaments.

Perhaps we could teach science to politicians? It is an attractive idea, but which busy politician has sufficient time? In practice, policy-makers almost never read scientific papers or books. The research relevant to the topic of the day — for example, mitochondrial replacement, bovine tuberculosis or nuclear-waste disposal — is interpreted for them by advisers or external advocates. And there is rarely, if ever, a beautifully designed double-blind, randomized, replicated, controlled experiment with a large sample size and unambiguous conclusion that tackles the exact policy issue.

In this context, we suggest that the immediate priority is to improve policy-makers' understanding of the imperfect nature of science. The essential skills are to be able to intelligently interrogate experts and advisers, and to understand the quality, limitations and biases of evidence. We term these interpretive scientific skills. These skills are more accessible than those required to understand the fundamental science itself, and can form part of the broad skill set of most politicians.

To this end, we suggest 20 concepts that should be part of the education of civil servants, politicians, policy advisers and journalists — and anyone else who may have to interact with science or scientists. Politicians with a healthy scepticism of scientific advocates might simply prefer to arm themselves with this critical set of knowledge.

We are not so naive as to believe that improved policy decisions will automatically follow. We are fully aware that scientific judgement itself is value-laden, and that bias and context are integral to how data are collected and interpreted. What we offer is a simple list of ideas that could help decision-makers to parse how evidence can contribute to a decision, and potentially to avoid undue influence by those with vested interests. The harder part — the social acceptability of different policies — remains in the hands of politicians and the broader political process.

Of course, others will have slightly different lists. Our point is that a wider understanding of these 20 concepts by society would be a marked step forward.

Differences and chance cause variation. The real world varies unpredictably. Science is mostly about discovering what causes the patterns we see. Why is it hotter this decade than last? Why are there more birds in some areas than others? There are many explanations for such trends, so the main challenge of research is teasing apart the importance of the process of interest (for example, the effect of climate change on bird populations) from the innumerable other sources of variation (from widespread changes, such as agricultural intensification and spread of invasive species, to local-scale processes, such as the chance events that determine births and deaths).

No measurement is exact. Practically all measurements have some error. If the measurement process were repeated, one might record a different result. In some cases, the measurement error might be large compared with real differences. Thus, if you are told that the economy grew by 0.13% last month, there is a moderate chance that it may actually have shrunk. Results should be presented with a precision that is appropriate for the associated error, to avoid implying an unjustified degree of accuracy.

Bias is rife. Experimental design or measuring devices may produce atypical results in a given direction. For example, determining voting behaviour by asking people on the street, at home or through the Internet will sample different proportions of the population, and all may give different results. Because studies that report 'statistically significant' results are more likely to be written up and published, the scientific literature tends to give an exaggerated picture of the magnitude of problems or the effectiveness of solutions. An experiment might be biased by expectations: participants provided with a treatment might assume that they will experience a difference and so might behave differently or report an effect. Researchers collecting the results can be influenced by knowing who received treatment. The ideal experiment is double-blind: neither the participants nor those collecting the data know who received what. This might be straightforward in drug trials, but it is impossible for many social studies. Confirmation bias arises when scientists find evidence for a favoured theory and then become insufficiently critical of their own results, or cease searching for contrary evidence.

Bigger is usually better for sample size. The average taken from a large number of observations will usually be more informative than the average taken from a smaller number of observations. That is, as we accumulate evidence, our knowledge improves. This is especially important when studies are clouded by substantial amounts of natural variation and measurement error. Thus, the effectiveness of a drug treatment will vary naturally between subjects. Its average efficacy can be more reliably and accurately estimated from a trial with tens of thousands of participants than from one with hundreds.

Correlation does not imply causation. It is tempting to assume that one pattern causes another. However, the correlation might be coincidental, or it might be a result of both patterns being caused by a third factor — a 'confounding' or 'lurking' variable. For example, ecologists at one time believed that poisonous algae were killing fish in estuaries; it turned out that the algae grew where fish died. The algae did not cause the deaths2.

Regression to the mean can mislead. Extreme patterns in data are likely to be, at least in part, anomalies attributable to chance or error. The next count is likely to be less extreme. For example, if speed cameras are placed where there has been a spate of accidents, any reduction in the accident rate cannot be attributed to the camera; a reduction would probably have happened anyway.

Extrapolating beyond the data is risky. Patterns found within a given range do not necessarily apply outside that range. Thus, it is very difficult to predict the response of ecological systems to climate change, when the rate of change is faster than has been experienced in the evolutionary history of existing species, and when the weather extremes may be entirely new.

Beware the base-rate fallacy. The ability of an imperfect test to identify a condition depends upon the likelihood of that condition occurring (the base rate). For example, a person might have a blood test that is '99% accurate' for a rare disease and test positive, yet they might be unlikely to have the disease. If 10,001 people have the test, of whom just one has the disease, that person will almost certainly have a positive test, but so too will a further 100 people (1%) even though they do not have the disease. This type of calculation is valuable when considering any screening procedure, say for terrorists at airports.

DAWID RYSKI

Controls are important. A control group is dealt with in exactly the same way as the experimental group, except that the treatment is not applied. Without a control, it is difficult to determine whether a given treatment really had an effect. The control helps researchers to be reasonably sure that there are no confounding variables affecting the results. Sometimes people in trials report positive outcomes because of the context or the person providing the treatment, or even the colour of a tablet3. This underlies the importance of comparing outcomes with a control, such as a tablet without the active ingredient (a placebo).

Randomization avoids bias. Experiments should, wherever possible, allocate individuals or groups to interventions randomly. Comparing the educational achievement of children whose parents adopt a health programme with that of children of parents who do not is likely to suffer from bias (for example, better-educated families might be more likely to join the programme). A well-designed experiment would randomly select some parents to receive the programme while others do not.

Seek replication, not pseudoreplication. Results consistent across many studies, replicated on independent populations, are more likely to be solid. The results of several such experiments may be combined in a systematic review or a meta-analysis to provide an overarching view of the topic with potentially much greater statistical power than any of the individual studies. Applying an intervention to several individuals in a group, say to a class of children, might be misleading because the children will have many features in common other than the intervention. The researchers might make the mistake of 'pseudoreplication' if they generalize from these children to a wider population that does not share the same commonalities. Pseudoreplication leads to unwarranted faith in the results. Pseudoreplication of studies on the abundance of cod in the Grand Banks in Newfoundland, Canada, for example, contributed to the collapse of what was once the largest cod fishery in the world4.

Scientists are human. Scientists have a vested interest in promoting their work, often for status and further research funding, although sometimes for direct financial gain. This can lead to selective reporting of results and occasionally, exaggeration. Peer review is not infallible: journal editors might favour positive findings and newsworthiness. Multiple, independent sources of evidence and replication are much more convincing.

Significance is significant. Expressed as P, statistical significance is a measure of how likely a result is to occur by chance. Thus P = 0.01 means there is a 1-in-100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all. Typically, scientists report results as significant when the P-value of the test is less than 0.05 (1 in 20).

Separate no effect from non-significance. The lack of a statistically significant result (say a P-value > 0.05) does not mean that there was no underlying effect: it means that no effect was detected. A small study may not have the power to detect a real difference. For example, tests of cotton and potato crops that were genetically modified to produce a toxin to protect them from damaging insects suggested that there were no adverse effects on beneficial insects such as pollinators. Yet none of the experiments had large enough sample sizes to detect impacts on beneficial species had there been any5.

Effect size matters. Small responses are less likely to be detected. A study with many replicates might result in a statistically significant result but have a small effect size (and so, perhaps, be unimportant). The importance of an effect size is a biological, physical or social question, and not a statistical one. In the 1990s, the editor of the US journal Epidemiology asked authors to stop using statistical significance in submitted manuscripts because authors were routinely misinterpreting the meaning of significance tests, resulting in ineffective or misguided recommendations for public-health policy6.

Study relevance limits generalizations. The relevance of a study depends on how much the conditions under which it is done resemble the conditions of the issue under consideration. For example, there are limits to the generalizations that one can make from animal or laboratory experiments to humans.

Feelings influence risk perception. Broadly, risk can be thought of as the likelihood of an event occurring in some time frame, multiplied by the consequences should the event occur. People's risk perception is influenced disproportionately by many things, including the rarity of the event, how much control they believe they have, the adverseness of the outcomes, and whether the risk is voluntarily or not. For example, people in the United States underestimate the risks associated with having a handgun at home by 100-fold, and overestimate the risks of living close to a nuclear reactor by 10-fold7.

Dependencies change the risks. It is possible to calculate the consequences of individual events, such as an extreme tide, heavy rainfall and key workers being absent. However, if the events are interrelated, (for example a storm causes a high tide, or heavy rain prevents workers from accessing the site) then the probability of their co-occurrence is much higher than might be expected8. The assurance by credit-rating agencies that groups of subprime mortgages had an exceedingly low risk of defaulting together was a major element in the 2008 collapse of the credit markets.

Data can be dredged or cherry picked. Evidence can be arranged to support one point of view. To interpret an apparent association between consumption of yoghurt during pregnancy and subsequent asthma in offspring9, one would need to know whether the authors set out to test this sole hypothesis, or happened across this finding in a huge data set. By contrast, the evidence for the Higgs boson specifically accounted for how hard researchers had to look for it — the 'look-elsewhere effect'. The question to ask is: 'What am I not being told?'

Extreme measurements may mislead. Any collation of measures (the effectiveness of a given school, say) will show variability owing to differences in innate ability (teacher competence), plus sampling (children might by chance be an atypical sample with complications), plus bias (the school might be in an area where people are unusually unhealthy), plus measurement error (outcomes might be measured in different ways for different schools). However, the resulting variation is typically interpreted only as differences in innate ability, ignoring the other sources. This becomes problematic with statements describing an extreme outcome ('the pass rate doubled') or comparing the magnitude of the extreme with the mean ('the pass rate in school x is three times the national average') or the range ('there is an x-fold difference between the highest- and lowest-performing schools'). League tables, in particular, are rarely reliable summaries of performance.

Journal name:
Nature
Volume:
503,
Pages:
335–337
Date published:
()
DOI:
doi:10.1038/503335a

References

  1. Doubleday, R. & Wilsdon, J. Nature 485, 301302 (2012).

  2. Borsuk, M. E., Stow, C. A. & Reckhow, K. H. J. Water Res. Plan. Manage. 129, 271282 (2003).

  3. Huskisson, E. C. Br. Med. J. 4, 196200 (1974)

  4. Millar, R. B. & Anderson, M. J. Fish. Res. 70, 397407 (2004).

  5. Marvier, M. Ecol. Appl. 12, 11191124 (2002).

  6. Fidler, F., Cumming, G., Burgman, M., Thomason, N. J. Socio-Economics 33, 615630 (2004).

  7. Fischhoff, B., Slovic, P. & Lichtenstein, S. Am. Stat. 36, 240255 (1982).

  8. Billinton, R. & Allan, R. N. Reliability Evaluation of Power Systems (Plenum, 1984).

  9. Maslova, E., Halldorsson, T. I., Strøm, M., Olsen, S. F. J. Nutr. Sci. 1, e5 (2012).

Author information

Affiliations

  1. William J. Sutherland is professor of conservation biology in the Department of Zoology, University of Cambridge, UK.

  2. David Spiegelhalter is at the Centre for Mathematical Sciences, University of Cambridge.

  3. Mark Burgman is at the Centre of Excellence for Biosecurity Risk Analysis, School of Botany, University of Melbourne, Parkville, Australia.

Corresponding author

Correspondence to:

Author details

For the best commenting experience, please login or register as a user and agree to our Community Guidelines. You will be re-directed back to this page where you will see comments updating in real-time and have the ability to recommend comments to other users.

Comments for this thread are now closed.

Comments

19 comments Subscribe to comments

  1. Avatar for Tal Galili
    Tal Galili
    While the article offer nice simplifications of many important topics in understanding scientific claims, the description of significance (e.g: p-value), is severely lacking. To quote Wikipedia's section "misunderstandings of p-value" on this: 1) The p-value is not the probability that the null hypothesis is true, nor is it the probability that the alternative hypothesis is false – it is not connected to either of these. In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses. Comparison of Bayesian and classical approaches shows that a p-value can be very close to zero while the posterior probability of the null is very close to unity (if there is no alternative hypothesis with a large enough a priori probability and which would explain the results more easily). This is Lindley's paradox. But there are also a priori probability distributions where the posterior probability and the p-value have similar or equal values. 2) The p-value is not the probability that a finding is "merely a fluke." As calculating the p-value is based on the assumption that every finding is a fluke (that is, the product of chance alone), it cannot be used to gauge the probability of a finding being true. The p-value is the chance of obtaining the findings we got (or more extreme) if the null hypothesis is true. https://en.wikipedia.org/wiki/P-value#Misunderstandings
  2. Avatar for Patrick Elliott
    Patrick Elliott
    I would add in one more: #21: Understand what placebo actually does mean. Placebo isn't "does nothing". Rather, it can be very effective for pain, and even has been shown to have other physiological effects, which can even include improvements in immune responses (though, this and other effects are not as great as for pain). The reason being that its literally the "mind over matter" effect. The default assumption is, among those that don't believe in "mind over matter" is that all this stuff kind of runs on automatic, and the brain has no control over it. Ironically, the "true believers" in mind over matter seem to think that the mind (not the brain, duh...) has some sort of mystical, super-power to effect everything, if you just spend enough time meditating. The reality is a bit more.. complicated, and vastly less mystical. Pain needs to react on a "huge" scale to mental state. For a wounded animal, the ability to, unconsciously, fade out "existing" pain from their wound, such as a badly injured leg, for the duration of a fight, or escape, can be life or death. Even the ability to do things that are familiar, and thus comforting, and thus reduce the pain, might be critical to handling the stress of such injuries. In short, there is already a mechanism there, which can **selectively** knock out pain, and humans have way more complex, not just conscious perceptions, but layers of sub-conscious responses. Its hardly a surprise that, some where in there, is a way for a placebo to selectively have a major effect on pain, even over long periods of time (like for acupuncture, a practice based on the very silly idea that the body contains nine rivers of power in it, with tributaries, just like ancient China, where it was invented). Obviously, state of mind "might" be useful to enhance, or suppress, intentionally or otherwise, immunity, digestion, and a whole lot of other semi-automatic things. Things that, when upset by certain conditions, or even conditioned responses (I feel like running away every time someone gets sick at work, and I am the one "assigned" to clean the floors that day, for example), which may have perfectly reasonable causes (if everyone else is sick, it might be something you ate, so maybe you need to be sick too?, for example), but which can be unlearned, or otherwise modified by experience. So, rather than placebo being "does nothing", a more accurate description is, "Takes advantage of preexisting triggers and experience, to cause **limited** changes in what the body is doing." If you think about that for a moment, you can see why its huge headache for medicine research, and a major problem for the credibility of people selling things that are *not* tested as carefully. In both cases, trust in the person giving it to you, trust that it will work, belief in the method being employed, and other factors, will all "trigger" this placebo effect. Someone taking "big pharma" drugs might feel sicker taking them, but better taking something given to them by a guru, but its the lack of trust in the former, and the unreasonable trust in the later, as well as the products being given, which is having the effect, not the actual drugs themselves. Its a major pain for drug makers, to actually get provable results, which are not **purely** due to everyone in the study believing in the medicine, and the doctor. For the guru/herbalist, etc.... its less of a problem, since the people going to them are probably completely certain it will help them, and trust them, even when, from the stand point of actually treating the disease, it might be totally worthless. And, that can kill people, who would have otherwise *been* cured, by proven medications. Note: this isn't a denial of there being problems with Big Pharma, or with medicine as its practiced. The former is about money (and so it big-CAM, which is quickly creeping up, closer and closer to being worth as many billions), while doctors are all too often paid to offer new drugs, instead of proven ones, paid by number of procedures, instead of based on outcomes, and almost everything we rely on that *does* undeniably work, is no longer even made by Big Pharma, but by smaller, less exacting, more prone to errors, shortfalls, or price gouging, because those, often life saving drugs, are, "no longer cost effective to produce." The same reason why some promising drugs, for rare, sometimes even fatal, but possibly treatable, manageable, or even curable, diseases, never make it to market at all. Its undeniable we have problems with the industry, but.. its also undeniable that the only reason that doctors "generally" are not allowed to push placebos, but insurance companies are willing to pay for unlicensed, not recognized by the AMA, "specialists" for them instead, is because insurance companies can't have their licenses pulled for lying to their patients about the chances of something curing them. Instead, they just decide which one will cost less to give you, something that isn't proven, but a lot of people like, but it vastly cheaper, or something that might not help you anyway, but makes the doctor a lot of money, involves lots of , sometimes, unnecessary and inconclusive, lab work and tests, and then, might not help you, if you are ill enough, any more than the placebo would have. Of course the insurance companies are willing to pay for that as an option...
  3. Avatar for Vijendra Agarwal
    Vijendra Agarwal
    te
  4. Avatar for Mike Taylor
    Mike Taylor
    See also the excellent article in response by Chris Tyler (director of the UK's Parliamentary Office of Science and Technology) on Top 20 things scientists need to know about policy-making. These two articles make a very helpful complementary pair.
  5. Avatar for Edward Powell
    Edward Powell
    To these twenty, I would add two more: 21) The predictive ability of computer simulations is extremely limited, and should only be relied on when substantial and detailed agreement of the simulation results with experimental data is achieved, *and* when a detailed and thorough understanding of the elements of uncertainty in the models and simulations is achieved. A true scientist tries his best to falsify, disprove, or even "break" his models and simulations in any possible way, before relying on the results. John von Neuman said, "With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." Any simulation with *any* free parameters is suspect, and detailed, conclusive, and scientifically justified causal relationships must be well understood for each and every free parameter before a simulation can even begin to be considered reliable. 22) No scientist shall, under any circumstances, refer to a simulation run as an "experiment". Simulations are not experiments, only data taken from actual reality-based experiments can be called real data.
  6. Avatar for Oliver H.
    Oliver H.
    What is "reality-based"? The vast majority of basic research is not "reality-based", but based on lab experiments that do not present reality but are actively controlled to keep certain factors fix that would vary in "reality", so as to isolate an effect. Tissue culture is nothing BUT a simulation as to what happens in actual tissue.
  7. Avatar for Chris Atkins
    Chris Atkins
    This is an excellent article and a subject close to my heart. It could (and should) be used with secondary (high) school students, particularly (but not exclusively) of science. For that purpose each 'tip' would benefit from a short (one page) article which describes the issue in more detail with one or two examples to bring it to life.
  8. Avatar for Lee De Cola
    Lee De Cola
    this is a great list, but to be widely read it needs to be shortened...i've tried and come up with 3 groupings: psychology (of scientists), experiments, and statistics - could we get along with 3 or 4 tips in each group?
  9. Avatar for James Scanlan
    James Scanlan

    The is a useful list in many respects. But it overlooks what may be the most serious problem in the interpretation of data on differences between outcome rates. Virtually all efforts to interpret data ongroup differences in outcome rates, including data on rates of treated and control subjects in clinical trials, are undermined by failure to recognize patterns by which standard measures of differences between outcome rates tend to be systematically affected by the prevalence of an outcome. The most notable of these patterns is that by which the rarer an outcome the greater tends to be the relative difference in experiencing it and the smaller tends to be the relative difference in avoiding it. By way of example pertinent to item 20 on the list, lowering test cutoffs (or generally improving test performance) tends to reduce relative differences between rates at which two groups pass a test while increasing relative differences rates at which they fail the test. Among other examples, reducing poverty tends to increase relative differences in poverty rates while reducing relative differences in rates of avoiding poverty; improving health tends to increase relative differences in mortality and other adverse health outcomes while reducing relative differences in survival and other favorable health outcomes; improving healthcare tends to reduce relative differences in receipt of appropriate care while increasing relative differences in rates of failing to receive appropriate care; reducing adverse lending outcomes or school suspension rates tends to increase relative difference in experiencing those outcomes while reducing relative differences in avoiding those outcomes. It is not possible to interpret data on group differences or advise policy makers on such issues without knowing these things. Yet very few people interpreting data on differences in outcome rates know that it is even possible for the two relative differences to change in opposite directions, much less that they tend to do so systematically. A number of references explaining these and related patterns and their implications in varied contexts are listed below.

    Reference 1 provides a fairly succinct explanation of the pattern of relative differences described above and does so in the course of explaining that, as a result of the failure to understand the pattern, the US government encourages lenders and schools to take actions that make it more likely that the government will sue them for discrimination. Reference 8 explains some of the clinical implications of the failure to understand the pattern, explaining as well that the rate ratio mentioned as a measure of effect in item 20 of the list is not merely a flawed measure of effect, but an illogical one. References 9 and 10 are discussions of the above-described pattern of relative differences by other persons. The latter reference observes that governments that ignore the pattern “run the risk of guaranteeing failure, largely for conceptual and methodological reasons rather than social welfare reasons.” The observation was focused on the meeting of health inequalities reduction goals cast in relative terms. But, as reflected in references 1 through 8, failure to understand the pattern, and relative patterns by which measures tend to be affected by the prevalence of an outcome, undermines society’s understanding of a great many things.

    1. “Misunderstanding of Statistics Leads to Misguided Law Enforcement Policies” (Amstat News, Dec. 2012):
    http://magazine.amstat.org/blog/2012/12/01/misguided-law-enforcement/

    2. “Measuring Health and Healthcare Disparities,” Federal Committee on Statistical Methodology 2013 Research Conference, Nov. 5-7, 2013.
    http://jpscanlan.com/images/2013_Fed_Comm_on_Stat_Meth_paper.pdf
    http://jpscanlan.com/images/2013_FCSM_Presentation.ppt

    3. “The Mismeasure of Discrimination,” Faculty Workshop at the University of Kansas School of Law, Sept. 20, 2013:
    http://jpscanlan.com/images/Univ_Kansas_School_of_Law_Faculty_Workshop_Paper.pdf
    http://jpscanlan.com/images/University_of_Kansas_School_of_Law_Workshop.pdf

    4. “The Mismeasure of Group Differences in the Law and the Social and Medical Sciences,” Applied Statistics Workshop at the Institute for Quantitative Social Science at Harvard University, Oct. 17, 2012:
    http://jpscanlan.com/images/Harvard_Applied_Statistic_Workshop.ppt

    5. “Can We Actually Measure Health Disparities?” (Chance, Spring 2006):
    http://www.jpscanlan.com/images/Can_We_Actually_Measure_Health_Disparities.pdf

    6. “The Misinterpretation of Health Inequalities in the United Kingdom,” British Society for Populations Studies Conference 2006, Sept. 18-20, 2006:
    http://www.jpscanlan.com/images/BSPS_2006_Complete_Paper.pdf

    7. “Race and Mortality” (Society, Jan./Feb. 2000):
    http://www.jpscanlan.com/images/Race_and_Mortality.pdf

    8. Subgroup Effects subpage of the Scanlan’s Rule page of jpscanlan.com
    http://www.jpscanlan.com/scanlansrule/subgroupeffects.html

    9. Lambert PJ, Subramanian S. Disparities in Socio-Economic outcomes: Some positive propositions and their normative implications. Society for the Study of Economic Inequality Working Paper Series, ECINEQ WP 2012 – 281:
    http://www.ecineq.org/milano/WP/ECINEQ2012-281.pdf

    10. Bauld L, Day P, Judge K. Off target: A critical review of setting goals for reducing health inequalities in the United Kingdom. Int J Health Serv 2008;38(3):439-454:
    http://baywood.metapress.com/app/home/contribution.asp?referrer=parent&backto=issue,4,11;journal,7,157;linkingpublicationresults,1:300313,1

  10. Avatar for Benjamin Allen
    Benjamin Allen
    An excellent article - well done! Translation from science into policy is made even more difficult when scientists themselves produce and promote unreliable science. Sutherland et al's 20 tips are well worth the read. For an ecological example that epitomises many of the 20 points they raise, see 'http://www.sciencedirect.com/science/article/pii/S0006320712005022'.
  11. Avatar for Douglas Duncan
    Douglas Duncan
    Almost all those politicians have been to university, but these aspects of science are rarely discussed with non-science majors there. They need to be! I have been doing so for about 6 years, and have published the effects it has on students. Remember, these people have probably never been to a scientific meeting; they have little idea how scientists interact, and they sure don't learn that in a classroom, because most of us behave entirely differently there than we do with colleagues. The way scientists interact, how they arrive at consensus, what causes them to doubt (even the basic distinction between peer reviewed and non-peer reviewed publication) are things they have never encountered (according to our interviews of non-science majors). Anyone is welcome to use my curriculum, which is here (along with reference to the published results): http://casa.colorado.edu/~dduncan/pseudoscience/
  12. Avatar for Bart Penders
    Bart Penders
    The public credibility of science and the trust non-scientists place in scientific claims and the institution of science is not primarily derived from the methodological quality of scientific inquiry. The list above is worthwhile to determine that quality (or any threats to it), but not to help establish the public credibility (or trust). Let me quote Harvard historian of science Steven Shapin who wrote, eloquently: "When King Lear decided to take early retirement, he announced his intention to divide up the kingdom among his three daughters, each to get a share proportioned to the genuine love she bore him. Each is asked to testify to her love. For Goneril and Regan that presents no problem, and both use the oily art of rhetoric to good effect. Cordelia, however, trusts to the authenticity of her love and says nothing more than the simple truth. For Lear this will not do. Truth is her dower but credibility has she none. Cordelia, we should understand, was a modernist methodologist. The credibility and the validity of a proposition ought to be one and the same. Truth shines by its own light. And those claims that need lubrication by the oily art thereby give warning that they are not true". (Shapin 1995). Cordelia acts from the conviction that contained in her claims is the truth and that it can be accessed by those who hear it. The credibility of her claim is evident to herself - but only to herself. For a scientist to convince non-scientists, persuading them to see the truth as (s)he does, requires - next to that truth - a construction of credibility. Scientists have very strict and shared rules to establish the credibility of their claims, these include notions of peer review, citations, transparant methodologies and much more. They are part of the well-known academic credibilising strategy in which the work scientists do to make their data, arguments, theories or claims become stable and uncontested truths (i.e. to achieve absolute credibility). "The same work enables them to conduct further research by strengthening their reputation and attracting new funding. The process of gathering credibility is never-ending and cyclical: it drives the credibility cycle. Important in this process are moments of conversion, or translation. Latour and Woolgar show how time, money and effort are translated into data; data is translated into arguments, which are subsequently written down in publications. Peer scientists will read or cite them, resulting in recognition for their claim or argument. This recognition can be mobilised to support new funding requests. The new funding, translated into new personnel, projects or machineries, will produce new data, continuing the cycle. Every translation that takes place, contributes to the credibility of a given scientific claim, which extends to the researchers, laboratory or institute making the claim. If the ‘scientific wheel of credibility’ continues to turn, it will uphold or heighten the credibility of claims and claimants. Academic conventions, such as scientific publications, citations and peer review are part of this credibility cycle. The credibility cycle as a strategy to construct credibility is powerful only within the scientific community: outside the university walls, peer review and citation cultures are relatively unknown and cannot accrue similar credibility." (Penders 2013). Non-scientists (as well as scientists, when it comes to non-scientific claims) use other criteria, standards and strategies to evaluate the credibility of claims. What matters most, in real life, are the shape or form (visually, or rhetorically) of a message, the medium or location of a claim (e.g. quality newspaper vs. casual water-cooler chat), the degree in which a claim specifically addresses a problem (or question) the audience identifies with, the source of the claim (a trusted colleague or friend vs. an unknown online forum) and the relationship between the claim and the audience. The route a scientific claim takes before it reaches any given policy maker, may matter more than its content. Who whispers it into her ear may matter more than its content and the dominant political climate may matter more than the content of the claim. The authors wisely acknowledge that they "are not so naive as to believe that improved policy decisions will automatically follow". The reason for that is that any scientific claim wildly varies in its credibility over time, location and much more. That credibility is not derived from its content, but it is the reason why a claim is takes seriously, acted upon, insert into or excluded from policy.
  13. Avatar for Gavin Cawley
    Gavin Cawley
    For policy related issues, the tip I would add would be: Always interpret a scientific finding in the way that provides the least support for ones current position. This helps to guard against the sort of confirmation biases etc. to which we are all susceptible. If the finding genuinely supports your position, it will still do so under its least favourable interpretation, and adopting this interpretation will demonstrate your self-skepticism.
  14. Avatar for Anand Ramanathan
    Anand Ramanathan
    The article is good about warning policy makers about new scientific claims. However, it is a little worrisome that the same logic can be applied to any science to reject established results such as the warming over the last few decades or the effect of smoking. I would add one more point. Science typically does not change overnight and results that have been tested and confirmed over years (or decades) are unlikely to be wrong. Any new result that seeks to overthrow some bit of well-established science should be backed by a much higher standard of proof, such as independent replications, or carefully controlled studies with large sample sizes.
  15. Avatar for Gavin Cawley
    Gavin Cawley
    In the case of the warming of the last few decades, I would say this is addressed by the points "Separate no effect from non-significance" and "Data can be dredged or cherry picked" as such arguments are normally based on the lack of a statistically significant warming trend since some cherry picked start (e.g. 1998 El-Nino) and ignore the statistical power of the test. A test for the existence of a change in the underlying rate of warming would likely also give a non-significant result.
  16. Avatar for Christopher Tong
    Christopher Tong
    A few other tips might include an awareness of Simpson's Paradox, the Curse of Dimensionality, and the ease with which data models can overfit one data set and fail to generalize to others. The distinction between exploratory and confirmatory findings should also be emphasized. Exploratory research seeks to generate new hypotheses, and confirmatory research aims to evaluate pre-specified hypotheses. Sometimes a single study will have both confirmatory and exploratory findings, as follows. The confirmatory claims are pre-specified before the data was collected, and their consistency with the subsequent data are then evaluated. Exploratory findings are tentative findings of other, unanticipated patterns in the same data. The epistemological status of both kinds of findings must be kept distinct. Finally, data analysis methods must be chosen for fitness for purpose, and not by pattern matching, as is too often the case.
  17. Avatar for Brad Louis
    Brad Louis
    Great list. Can I suggest a few more? We don't know everything (Science is never settled. Any scientist can be wrong.) - that's why we keep investing in scientific research. We will know more in another hundred years, by which time a lot of what we think we know now will be proven to be nonsense. Proper experiments produce the most robust scientific evidence - good experimental design can eliminate many of the problems mentioned in earlier tips. Observational studies and computer models are not a patch on proper experiments. Science funding supports a process not an outcome - well done science doesn't pretend to know the outcome of research, diversity of research perspectives is a good thing.
  18. Avatar for Ian Campbell
    Ian Campbell
    This article provides a valuable summary of issues that need to be appreciated in interpreting science. However, in the environmental decision making arena, the science is never sole factor to be considered - there is always a value judgement as well. Science tells us the consequences of greenhouse gas emissions, or the likelihood that logging a particular area of forest will cause the extinction of a particular species, but then we have to make a value judgement. Do I value the profits I make from my power station more than the contribution it makes to global warming? Do I value the timber products I can get by logging this forest more than the possum that will become extinct? As value judgements these decisions should be informed by the science, but are made by politicians who are expected to reflect the values of their society. Politicians are always averse to making value judgement calls that will be unpopular with a substantial proportion of their community, or an influential group within the community, so they often try to deflect that responsibility. Two common deflection techniques are appeals to (often dodgy) short term economics (it would be too expensive, cost too many jobs), and raising doubts about the science either by citing poor science, or by suggesting that there is a lack of scientific consensus on the issue. It is difficult to tell whether a true lack of scientific understanding, as addressed in the article, or a misunderstanding of convenience to deflect criticism is a greater problem.
  19. Avatar for Fred Sachs
    Fred Sachs
    The article does emphasize some important issues but at the core is how one might evaluate the contributions of all these factors. The probability of success in a project cannot be calculated and a legislator would have to assign a weighting factor to each of the 20 criteria, let alone the societal consequences, to decide on the level of support and it can't be done. Real answers like this depend on numbers and we don't have the information to generate the numbers. I suspect that we are going to have to put up with intuition on the part of the policy makers. I have a few specific issues. One is that 5% significance is an entirely subjective choice by the researcher and has no significance if the distribution is not Gaussian. That should be the first test. Even at 5% there is a 1 in 20 chance that the hypothesis is wrong. Does this data represent that time? Especially in clinical data one should know the probability of getting better compared to the probability of getting worse. You should not do a one sided test to ask if subjects got better and ignore the times they got worse. The outcome should also be weighted by the payoff; a number not easily evaluated. Did the positive patients feel better while the negative ones died? Finally, for the cause and effect issue I like to bring up something seen a billion times a day: the roosters crowed and then the sun came up; clearly the roosters caused sunrise.

Refugee trauma

refugee-mental-health

The mental-health crisis among migrants

The refugees and migrants surging into Europe are suffering very high levels of psychiatric disorders. Researchers are struggling to help.

Newsletter

The best science news from Nature and beyond, direct to your inbox every day.

Deadly mutants?

genetic-mutants

A radical revision of human genetics

Why many ‘deadly’ gene mutations are turning out to be harmless.

US presidential race

Trump-supporters

The scientists who support Donald Trump

Science policy fades into background for many who back Republican candidate in US presidential race.

Brain implant

paralysis

Pioneering brain implant restores paralysed man's sense of touch

More-advanced implants will be needed to restore full sensation throughout the body.

ExoMars

Schiaparelli-lander

Europe and Russia prepare for historic landing on Mars

Schiaparelli touchdown would be ESA's first success on the red planet.

Nature Podcast

new-pod-red

Listen

This week, refugee mental health, better neural nets, and changing attitudes to female genital cutting.

Science jobs from naturejobs