Overhead shot of a dining plate with a carefully arranged meal of beef steak, asparagus, potatoes and drops of balsamic vinegar

Researchers have created a star-based metric that rates the quality of the evidence for a link between a given behaviour — such as eating red meat — and a particular health outcome.Credit: Education Images/Universal Images Group/Getty

Does eating red meat reduce lifespan? Some researchers certainly think so. Work such as the Global Burden of Diseases, Injuries, and Risk Factors Study1 has led the World Health Organization and the US Department of Agriculture to advise that people limit consumption of unprocessed red meat, to protect themselves from diseases such as type 2 diabetes and various cancers.

Other researchers are less sure. Targets for red-meat consumption, set by public-health officials and expert panels, vary widely, with some advising that people eat no more than 14 grams per day and others not stating a recommended limit. This sends a confusing message, which in itself is not good for public health.

It’s not just red meat: the evidence base surrounding much nutritional and wider health advice is similarly disputed. Now, a new approach could help health policymakers to better evaluate the quality of studies assessing potential health risks. A team at the Institute for Health Metrics and Evaluation (IHME) at the University of Washington in Seattle has created a star-based metric that rates the quality of the evidence for a link between a given behaviour — such as eating red meat or smoking — and a particular health outcome2. A five-star score means that the link is clearly established; one star means that either there’s no association between the two factors or that the evidence is too weak to draw a firm conclusion.

What the researchers call ‘burden of proof’ analysis does not, of itself, clear up vexing issues such as the risks of red meat or the benefits of vegetables. But as a judgement on the quality of available research, it can help to flag, to research funders, areas in which better evidence is needed for firmer conclusions.

How is the star rating constructed? What are its parameters — and can the methodology itself be considered to be rigorous research? The IHME team did several things to try to quantify the effects of various biases in the studies being assessed. An epidemiological study, for example, might be biased in different ways to a study testing the outcomes of health interventions. The researchers also did away with what can be a common source of bias in research, namely, the assumption that health risks increase exponentially with the parameter being studied, for example blood pressure or consumption of unprocessed red meat. And they attempted to account for the bias that can arise when sample sizes are small.

Applying this framework to studies assessing a total of 180 questions, produced results that are mostly unsurprising. Studies assessing an association between smoking and a variety of cancers, for example, earn a five-star rating3. Similarly, high systolic blood pressure — the force exerted by the heart to pump blood — has a five-star association with the narrowing of the blood vessels called ischaemic heart disease4.

Studies assessing diet and its health outcomes get notably lower star-ratings. The IHME’s analysis, for example, finds only weak evidence of an association between eating unprocessed red meat and outcomes such as colorectal cancer, type 2 diabetes and ischaemic heart disease5. It finds no relationship in studies that explore whether eating unprocessed red meat leads to two kinds of strokes. There is stronger, but not overwhelming, evidence that eating vegetables reduces the risk of strokes and ischaemic heart disease6.

In some cases, the lower star-ratings could be due to effect size: for example, any health risks from red-meat consumption are likely to be small relative to the huge toll that smoking takes on the body. Above all, the lower-rated findings demonstrate that studies in these areas need to get better if they are to yield convincing results.

Teasing out the effect of a single dietary component from those of the complex variety of exposures over a person’s lifetime is difficult. It would need larger studies, with a diverse pool of participants and strict control over their daily diet. Such studies will entail collaboration between research groups with different expertise, and access to participants in different environmental settings — a move that funders must encourage. This is an undertaking worth prioritizing. A small risk for an individual does not mean a small impact on public health: a low-risk behaviour can have a large population-level impact if it is very common.

The literature in the field of responsible research and innovation highlights how metrics in science must always be interrogated for robustness and rigour. There needs to be wide consultation and, as much as possible, the unintended consequences of using metrics must be anticipated, as initiatives such as the San Francisco Declaration on Research Assessment and the Leiden Manifesto show. This work must come sooner rather than later.

We have evidence that underpowered clinical studies, lacking necessary controls to make sense of the data, are not helping. If funders do not target their efforts at producing quality data, the public will remain confused, weary, distrustful and deprived of the information they need to make informed health and lifestyle choices.