We thank our colleagues from the USA for their News & Views article (Bilimoria K. Y. & Pawlik, T. M. Risk-adjusting away volume as a quality metric for surgical oncology: a perspective worth re-visiting. Nat. Rev. Clin. Oncol., https://doi.org/10.1038/s41571-022-00609-1; 2022)1 providing a critical appraisal of our recent analysis of risk-standardized mortality rates (RSMRs) compared with procedural volume as a quality proxy in surgical oncology2. We appreciate their nuanced perspective, given that the research and development of quality-improvement studies over the past 30 years has resulted in an enormous amount of data on this topic and that the many details might lead the reader to lose sight of the most important aspects of reliability and validity in outcomes research. In their article, Bilimoria and Pawlik1 state that “the use of RSMR as a quality proxy has been criticized on both theoretical and methodological grounds”, supporting this claim with literature predominately published in the 1990s. Within the past 20 years, however, methodological approaches have improved considerably (for example, owing to the better understanding and reliability of coding practices, differences between hospital RSMRs and their effects on long-term survival outcomes)3,4.

Summarizing this progress, Krell et al.5 have defined the necessary prerequisites for a reliable analysis of surgical outcome measures, considering that “outcomes with low reliability can mask both poor and outstanding performance”: (1) sampling 100% of certain procedures to avoid type I and II errors; (2) hierarchical reliability adjustment to shrink a provider’s risk-adjusted outcome rate towards the overall mean rate to account for variation in surgical outcomes attributable to ‘noise’ (especially for hospitals with low caseloads); and (3) using composite quality indicators, such as hospital availability. It is important to emphasize that — in contrast to many other studies — our analysis met all of these methodological criteria. Moreover, we further enhanced the reliability of our analysis by using training and validation cohorts. Bilimoria and Pawlik1 took issue with the small sample sizes of the very low-volume providers included in the analysis; however, this concern has already been addressed through sensitivity analyses performed by Chiu et al.6, who showed only a slight influence of small sample sizes on realignment efficiency. Nevertheless, reliability adjustments are limited by the variables available in the study database, and thus RSMR-classified surgical departments with low caseloads are an interesting subject for further investigation. Bilimoria and Pawlik1 also state that data from the USA have consistently demonstrated better mortality rates for high-volume versus low-volume hospitals. We, however, have found several studies suggesting otherwise — that hospitals with a high surgical volume do not automatically achieve the best results7,8,9. Of note, almost none of the low-volume hospitals included in our analysis performed well with regard to complex surgical procedures, suggesting that volume is indeed a relevant factor in quality assessment2. When Bilimoria and Pawlik state that “one-third of surgery-related deaths occur after discharge”1, thus limiting the validity of in-patient mortality, they are probably referring to data from the USA. Owing to a differently structured health-care system, the average hospital stay is substantially longer in Germany than in the USA, which means that postoperative mortality is adequately represented in Germany10. Ninety-day mortality, long-term survival and other parameters are certainly of interest as quality measures but are not universally recorded.

Ultimately, we as the surgical oncology community are all concerned with the question of how we should define and improve quality. After all, even RSMR is only a means to an end. Nevertheless, RSMRs illustrate the problems of the volume pledge: quantity is important but does not automatically translate into quality. Our work demonstrates that the exclusive use of volume as a quality indicator has several shortcomings2. Instead of focusing entirely on finding an optimal volume threshold, the surgical oncology community should also be looking for other ways in which medium-volume and high-volume centres can continue to innovate. Is it not more meaningful to strive to improve outcomes instead of jumping over a volume threshold, given that we have learned that both risk adjustment and volume matter? The inclusion of process measures (such as the use of multimodal therapy) and long-term outcomes (including 90-day mortality and classical survival end points, among others), although desirable, are often not feasible on a national scale. We should use the data that are now available for 99% of the population (at least in Germany) to develop incentives for quality improvement in complex surgery that can improve patient outcomes, in addition to increasing raw case numbers.