Is there expert consensus on expert consensus?

There are as many opinions as there are experts

Franklin Delano Roosevelt

Clinical practice guidelines have an important influence on decision-making for haematopoietic cell transplants. Many national, and international organizations and professional societies have formulated recommendations they believe clinicians should follow regarding who should receive a transplant, when and how. Given the paucity of high-quality evidence such as large randomized clinical trials addressing critical questions in transplantation, guidelines authors are often forced to rely on expert consensus statements, i.e., guidelines developed by an independent panel of experts, usually multi-disciplinary, convened to review and synthesize data from the biomedical literature, advance understanding of an issue and provide clinical practice recommendations.

Understanding the meaning of the words consensus and expert is critical if we are to judge the value of expert consensus statements. There are many definitions of consensus. For example, the Oxford English Dictionary defines consensus as a general agreement derived from the Latin consens, to agree. Importantly, there is no implication a consensus is correct. For example, there was consensus in the West in the Middle Ages the world was flat and Earth the centre of the Universe. These notions were known to be wrong more than 2,000 years earlier by Greek and Chinese astronomers. Other definitions of consensus are less kind. Abba Eban, a former Israeli Ambassador to the United States said: Consensus means that lots of people say collectively what nobody believes individually. Michael Crichton, an American physician and science fiction writer claimed: Historically, the claim of consensus has been the first refuge of scoundrels; it is a way to avoid debate by claiming that the matter is already settled. Least kind is Mayer Rus, an architecture critic who said: Consensus is the quickest road to mediocrity. The bottom line is consensus on an issue is nice but may have nothing to do with accuracy (see below).

What about experts. They fair even worse than consensus. Niels Bohr, the nuclear physicist, said: An expert is a man who has made all the mistakes which can be made in a very narrow field. William Castle, an American entertainer, defined an expert as: A man who tells you a simple thing in a confused way in such a fashion as to make you feel the confusion is your fault. And especially in our world of endless scientific meetings another definition suggests: An expert is just somebody from out of town with slides. Sound familiar?

Despite these irreverent definitions, people interested in the science of guidelines development know, together with scientific evidence, consensus among experts is a basic pillar of successful guidelines. Consensus amongst content experts, methodologists, clinicians and patient representatives fulfills the claim of diversity of needed to produce optimal recommendations. Consensus also ensures all participants have a voice which can influence the outcome by ensuring transparency, dealing with disagreement and resolving situations where a simple solution is impossible. Moreover, the theory of consensus is based on the respectable judgments theory which argues views of a group have greater validity and reliability than the judgment of an individual, and that a structured (systematic and explicit) methods for developing judgments are more valuable than informal methods (accuracy and communicability).

Consensus has along and colourful history. In 1906 the famous statistician Sir Francis Galton (half-cousin to Charles Darwin) attended a county fair in Plymouth (England) where there was a contest to guess the weight of an ox [1, 2]. Galton collected guesses from the crowd, averaged them and compared the median and that of several butchers (experts). His theory was the many non-experts (uneducated in his words) in the crowd would obfuscate the guesses of the few experts. However, the median of the crowd’s guesses, 1197 lbs., was within 1 lb. of the correct weight and much closer than any individual guess of the crowd or of the butchers [3]. It seems Galton may have fudged things a bit [4]. A recent internet-based repeat of this experiment by National Public Radio compared responses of 17,205 random respondents who looked at the photo of a cow named Penelope (no relation to Odysseus) and guessed a median weight of 1287 lbs. vs. a real weight of 1385 lbs., only 65 lbs. off [5]. A panel of 600 self-declared experts did slightly worse. More interestingly, the expert guesses ranged from 500–2000 lbs. suggesting the value of expert opinion depends very much of who and how many experts you query. These data are shown in Fig. 1. These concepts are reviewed in reference [6]).

Fig. 1
figure1

How much does this cow weigh?

How much cam we rely on expert consensus statements? Physicians, professional societies and health authorities should analyze the validity of these statements based on how they are developed and the quality of the evidence. Consensus panels use mostly informal processes designed to deal with the challenges of group discussions. However, informal processes are vulnerable to the idiosyncrasies of small group size, uncontrolled interactions, fiscal and time constraints, fatigue, lack of expertise in content methods, variable or inappropriate leadership and conflicts-of-interest. These variables threaten integrity of the process. In one recent analysis of almost 100 consensus statements were evaluated. Rigor of development score for consensus statements over three cancer journals (Current Oncology, European Journal of Cancer and Journal of Clinical Oncology) was one-third lower than that of evidence-based recommendations. Editorial independence score was also 15% lower for consensus statements [7].

Strategies to deal with the problem inherent to consensus methodology rely on formalized processes in which all panelists contribute equally. They consist of highly-structured, reiterative processes such as the RAND-Delphi or RAND/UCLA expert consensus panel process (https://en.wikipedia.org/wiki/Delphi_method). However, the use for medical problems of the consensus methodology suitable for the social sciences does not guarantee validity. Whether these complex structures add value to the consensus-based recommendations is unproved (reviewed in reference [8]). However, there are reasonably high levels of internal and external validation for these complex processes.

Expert opinion is not a surrogate for evidence-based data; science is not about consensus, its about the truth [9]. An example of applying the Delphi-method to a transplant-related question is given in reference [10]. This expert panel, using a (seemingly) sophisticated consensus process for develop nuclear weapons and jet fighter aircraft, incorrectly recommended high-dose chemotherapy and an autotransplant for some women with high-risk breast cancer. Subsequent randomized trials showed this approach ineffective [11,12,13].

Clinical practice guidelines are part of a movement termed evidence-based medicine. Proponents of this approach claim medical practice should be data-driven. Reasonable. However, it has become increasingly clear the strength of guideline recommendations depends on the quality of evidence they are based on which, unfortunately, is often poor. In several surveys of clinical studies published in high-quality medical journals about one-half of interventions were subsequently shown to be either unproved, ineffective or harmful [14,15,16]. Such changes, referred to as medical reversals, and are reviewed elsewhere [17]. Wide-spread use of high-dose chemotherapy and autotransplants in women with high-risk breast cancer cited above is a relevant example of a medical reversal [18]. Other recent medical reversals have far greater consequences. Some examples. It now seems there is no benefit of giving statins to healthy persons 40–75 years old with no history of cardio-vascular disease, no risk factors and a projected 10-year risk of heart disease <7.5–10% [19]. This includes many people in the US and EU currently taking statins. It also seems that after 40 years and 500,000 procedures per year in the US and EU, percutaneous coronary intervention in persons with stable angina is no better than medical therapy [20]. The bottom line is we often know much less than we think.

The other side of the evidenced-based medicine debate are people who believe medicine is more an art than a science and limiting medical practice to expert consensus statements and clinical practice guidelines removes focus from the individual patient. Both viewpoints may be correct [21].

We are not alone in our quandary of discordant guidelines. For example, a new study looked at 5 recently published clinical practice guidelines for using statins to prevent atherosclerotic cardio-vascular events in a population of about 45,000 otherwise healthy persons [22]. Applying different guidelines resulted in a recommendation to give statins to as few as 15% to as many as 44% of persons. Estimated reductions in events depending on which guideline was used ranged from a low of 13 to a high of 34%. If there is this uncertainty about a relatively simple intervention such as giving statins and where there are huge datasets, imagine the situation in the haematopoietic cell transplant arena.

At this point we need to deal with two more commonly misunderstood terms: accuracy and precision. These are distinct concepts in statistics. Accuracy refers to getting the correct answer, precision to getting the same answer on repetition regardless of whether it is the correct answer or not. A wrong answer which is reproducible is precise but inaccurate. What we need are accurate, precise answers. However, if answers are imprecise some or all of them must be inaccurate. With this in mind let’s look at two clinical practice guidelines from therapy of persons with acute myeloid leukaemia < 60 years of age in first complete remission. The 2017 National Comprehensive Cancer Network (NCCN) for acute myeloid leukaemia suggests several therapies as being comparable: (1) a clinical trial; (2) an allotransplant from a sibling or alternative donor; or (3) high-dose cytarabine [23]. The European LeukemiaNet (ELN) recommendation for the same setting states: Allogeneic HCT; haematopoietic cell transplant] is generally recommended when the relapse incidence without the procedure is expected to be >35–40% [24]. How do we reconcile these somewhat discordant recommendations? We cannot. But both cannot be correct. We can add to these recommendations from the American Society of Blood and Marrow Transplantation and European Bone Marrow Transplant Group which also differ somewhat from each other and from those of the NCCN and ELN.

As discussed, there are 2 fundamental issues in evaluating the validity of expert consensus statements: (1) how they were developed; and (2) the quantity of evidence considered. Consensus-based and evidence-based recommendations are intended to provide guidance to physicians, however, they differ. Evidence-based guideline produces statements informed by a systematic review of the evidence. They use structured approaches to collect, analyze and summarize relevant data to produce and grade recommendations. This approach is illustrated by the method suggested by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group which has developed an increasingly widely-adopted structure for guidelines development. Organizations such as Appraisal of Guidelines for Research and Evaluation [25] Institute of Medicine [26] (now National Academy of Medicine), Guidelines International Network [27] and Oxford Univ. Centre for Evidence-Based Medicine [28] developed criteria to ensure objective, scientifically valid and consistent standards for development and reporting of high-quality guidance documents.

The high impact of evidence quality on the validity of guidelines raise the important issue of which is the minimum level of evidence necessary to produce valid consensus-based recommendations, an issue which also reflects the companion issue of applicability of recommendations to clinical practice. An extensive review of this topic is beyond this Editorial but we summarize these topics in three accompanying Tables useful for judging quality of evidence, strength of recommendations and certainty of guidelines to clinical practice. Not all evidence is of comparable quality and experts should identify the quality of evidence underlying their consensus statements and strength of their recommendations. The validity of recommendations is a separate issue which should be evaluated by persons external to the authors using the GRADE approach, a method of assessing the certainty of evidence (also known as quality of evidence or confidence in effect estimates) and the strength of recommendations in health care [29]. (Tables 13)

Table 1 Scale of quality of evidence for a therapy recommendation
Table 2 Grades of recommendation
Table 3 Grading of recommendations assessment, development and evaluation (GRADE)

Some argue another purpose of expert consensus statements and clinical practice guidelines is to standardize therapy regardless of validity. Is this a valid goal? Yes and no. Take, for example, nuclear reactor design. In France, most nuclear reactors have the same design which works well. When a problem is detected in one reactor, a fix can be applied to the others. Contrast this with the US where almost every reactor design is different. Detecting a problem at one reactor are not easily applied to potential problems at others. This seems a strong argument for standardization. However, what if France had selected the bad reactor-design for all its reactors? The history of the economy of the Soviet Union in the 20th century is a good example of standardization with disastrous consequences.

Guidance documents are an essential part of oncology care and should be subjected to a rigorous, validated development process. The bottom line is expert consensus statements are likely to be effective in standardizing transplant strategies but this is not necessarily a virtue. It is difficult to know which recommendations will withstand scrutiny in future randomized clinical trials. However, such trials are unlikely to be done and, if done, conclusions unlikely to be widely-believed. The challenge is for haematologists to make appropriate patient-level decisions taking into consideration potential benefits and risks.

We realize most readers on Bone Marrow Transplantation like or even love expert consensus statements and clinical practice guideless. So do we.   But not all of these are created equal; some are brilliant and useful and others, ill-conceived, useless or even dangerous. And there is always the danger that adopting these blindly, similar to protocols and standard operating procedures, inhibits creative thinking. We acknowledge our comments are iconoclastic but we think them sensible (naturally). We finish with a quote from John Kenneth Galbraith, the Harvard economist: One of my greatest pleasures in my writing has come from the thought that perhaps my work might annoy someone of comfortably pretentious position. Then comes the realization that such people rarely read.

References

  1. 1.

    Endnote: the correct statistical term is estimate, not guess.

  2. 2.

    Galton F. Vox Populi. Nature. 1907;75:450–51.

    Article  Google Scholar 

  3. 3.

    Endnote: Galton considered the butchers to be experts.

  4. 4.

    Wallis KF. Revisiting Francis Galton’s forecasting competition. Stat Sci. 2014;29:420–24.

    Article  Google Scholar 

  5. 5.

    https://www.npr.org/sections/money/2015/08/07/429720443/17-205-people-guessed-the-weight-of-a-cow-heres-how-they-did. Accessed 8 Jan 2018.

  6. 6.

    Surowiecki J. The wisdom of crowds: why they are smarter than the few and how collective wisdom shapes business, economics, societies and nations. New York, NY: Anchor books; 2004.

    Google Scholar 

  7. 7.

    Jacobs C, Graham JD, Makarski J, Chasse M, Fergusson D, Hutton B, et al. Clinical Practice Guidelines and consensus statements in oncology—an assessment of their methodological quality. PLoS ONE. 2014. https://doi.org/10.1371/journal.pone.0110469.

    Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Kea B, Sun BJ. Consensus development for healthcare professionals. Intern Emerg Med. 2015;10:373–83.

    Article  Google Scholar 

  9. 9.

    Endnote: We acknowledge the truth may not be knowable with complete accuracy and precision because of 2 unavoidable considerations: measurement, error and chance.

  10. 10.

    Gale RP, Park RE, Dubois R, Bitran J, Buzdar A, Hortobagyi G, et al. Delphi-consensus panel analysis of appropriateness of high-dose chemotherapy and blood cell or bone marrow autotransplants in women with breast cancer. Clin Transplant. 2000;14:32–41.

    CAS  Article  Google Scholar 

  11. 11.

    Tallman MS, Gray R, Robert NJ, LeMaistre CF, Osborne CK, Vaughn WP, et al. Conventional adjuvant chemotherapy with or without high-dose chemotherapy and autologous stem-cell transplantation in high-risk breast cancer. N Engl J Med. 2003;349:17–26.

    CAS  Article  Google Scholar 

  12. 12.

    Stadtmauer EA, O’Neill A, Goldstein LJ, Crilley PA, Mangan KF, Ingle JN, et al. Conventional-dose chemotherapy compared with high-dose chemotherapy plus autologous hematopoietic stem-cell transplantation in metastatic breast cancer. Philadelphia Bone Marrow Transplant Group. N Eng J Med. 2000;342:1069–76.

    CAS  Article  Google Scholar 

  13. 13.

    Berry DA, Ueno NT, Johnson MM, Lei X, Caputo J, Rohenhuis S, et al. High-dose chemotherapy with autologous stem-cell support as adjuvant therapy in breast cancer: overview of 15 randomized trials. J Clin Oncol. 2011;29:3214–23.

    CAS  Article  Google Scholar 

  14. 14.

    Prasad V, Gail V, Cifu A. The frequency of medical reversal. Arch Intern Med. 2011;171:1675–6.

    Article  Google Scholar 

  15. 15.

    Ioannidis JP. Contradicted and initially stronger effects in higher cited clinical research. JAMA. 2005;294:218–28.

    CAS  Article  Google Scholar 

  16. 16.

    Space for BR Med J. http://bestpractice.bmj.com/info/us/

  17. 17.

    Prasad VK, CIfu AS. Ending medical reversal: improving outcomes, saving lives. 1st Edition. Baltimore, MD: John Hopkins University Press; 2005.

    Google Scholar 

  18. 18.

    Howard DH, Kenline C, Lazarus HM, Lemaistre CF, Maziarz RT, McCarthy PL Jr, et al. Abandonment of high-dose chemotherapy/hematopoietic cell transplants for breast cancer following negative trial results. Health Serv Res. 2011;46:1762–77.

    Article  Google Scholar 

  19. 19.

    https://www.uspreventiveservicestaskforce.org/Page/Document/RecommendationStatementFinal/statin-use-in-adults-preventive-medication1. Accessed 6 Jan 2018.

  20. 20.

    Al-Lamee R, Thompson D, Dehbi HM, Sen S, Tang K, Davies J, et al. Percutaneous coronary intervention in stable angina (ORBITA): a double-blind, randomized controlled trial. Lancet. 2018;391:31–40.

    Article  Google Scholar 

  21. 21.

    https://www.nytimes.com/2017/12/27/upshot/what-we-mean-when-we-say-evidence-based-medicine.html. Accessed 30 Dec 2017.

  22. 22.

    Mortensen MB, Nordestgaard BG. Comparison for statin use in primary prevention in a contemporary general population. Ann Intern Med. 2018;168: 85-92.

  23. 23.

    https://www.nccn.org/professionals/physician_gls/pdf/aml.pdf. Accessed 30 Dec 2017.

  24. 24.

    Döhner H, Estey E, Grimwade D, Amadori S, Appelbaum FS, Büchner T, et al. Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel. Blood. 2017;129:424–47.

    Article  Google Scholar 

  25. 25.

    Browers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, et al. AGREE II: advancing guideline development, reporting and evaluation in health care. CMAJ. 2010;182:E839–42.

    Article  Google Scholar 

  26. 26.

    Graham RMM, Miller Wolman D, Greenfield S, Steinberg E, Editors.. Clinical practice guidelines we can trust: Institute of Medicine (US) Committee on Standards for Developing Trustworthy Clinical Practice Guidelines. Washington (DC): National Academies Press; 2011. Committee on Standards for developing trustworthy clinical practice guidelines

    Google Scholar 

  27. 27.

    Guidelines International Network. http://www.g-i-n.net/. Accessed 19 Dec 2017.

  28. 28.

    http://www.cebm.net. Accessed 31 Dec 2017.

  29. 29.

    http://www.gradeworkinggroup.org/. Accessed 31 Dec 2017.

Download references

Acknowledgements

Prof. Hillard Lazarus (Case Western Reserve Univ.) kindly reviewed the typescript. RPG acknowledges support from the National Institute of Health Research (NIHR) Biomedical Research Centre funding scheme.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Robert Peter Gale.

Ethics declarations

Conflict of interest

RPG is a part-time employee of Celgene Corp.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barosi, G., Gale, R.P. Is there expert consensus on expert consensus?. Bone Marrow Transplant 53, 1055–1060 (2018). https://doi.org/10.1038/s41409-018-0128-2

Download citation

Further reading