The Art of Statistics: Learning from Data David Spiegelhalter Pelican (2019)
The aphorism “All models are wrong, but some are useful”, attributed to statistician George Box, is a cliché for a reason. It cuts to the heart of a central challenge facing researchers in many areas of science. The world is more complicated than anything that a mathematical, scientific or statistical model can capture. Yet models of the world, however imperfect, are necessary for drawing conclusions about everything from pharmaceutical efficacy to unemployment numbers. David Spiegelhalter’s The Art of Statistics shines a light on how we can use the ever-growing deluge of data to improve our understanding of the world — and of some of the pitfalls we encounter in the attempt.
The book is part of a trend in statistics education towards emphasizing conceptual understanding rather than computational fluency. Statistics software can now perform a battery of tests and crunch any measure from large data sets in the blink of an eye. Thus, being able to compute the standard deviation of a sample the long way is seen as less essential than understanding how to design and interpret scientific studies with a rigorous eye.
Throughout the book, Spiegelhalter emphasizes the importance of the “PPDAC” structure: Problem-Plan-Data-Analysis-Conclusion. He describes how statisticians approach each section of an investigation, and the tools that come into play. PPDAC starts with defining a problem or question and developing a plan for what to measure, how to measure it and what analyses will serve best. Then researchers collect data, analyse them according to the plan and decide what conclusions reasonably follow.
Spiegelhalter has had a long and prominent career as a statistician, and some of the most interesting passages occur when he pulls back the curtain and describes the choices he and his colleagues have made in the course of research. For example, he led a team that investigated death rates in children after heart surgery in British hospitals. Even for seemingly unambiguous figures — such as the number of children who had heart surgery and how many died — there were discrepancies between data sources and definitions.
Decisions about how to treat potential ambiguities and edge cases can have a large effect on the outcome of a study. For instance, for how long after an operation should a death be attributable to it? Because complete clarity and objectivity are not possible, people who read about these studies — especially politicians or jurors who have to decide whether unusual numbers of deaths owing to surgery point to crimes or malpractice — should be aware of the potential complicating factors, if only to temper their confidence in a finding.
As such examples show, a main takeaway from the book is a sense of circumspection about our confidence in what is known. As Spiegelhalter writes, the point of statistical science is to ease us through the stages of extrapolation from a controlled study to an understanding of the real world, “and finally, with due humility, be able to say what we can and cannot learn from data”. That humility can be lacking when statistics are used in debates about contentious issues such as the costs and benefits of cancer screening.
The book does an admirable job of covering a great deal of ground in limited space. Some concepts would have benefited from a deeper treatment: notably, bootstrapping, or estimating the distribution of a statistic on the basis of random resampling; and the central limit theorem, which holds that averages of increasingly large subsets of the data in many sets tend towards a normal distribution. However, Spiegelhalter had difficult decisions to make about how much of each topic he would unpack. A book covering the ideas of regression, null-hypothesis testing, Bayesian inference and much, much more cannot be comprehensive. The robust notes and bibliography will be useful for readers who wish to delve deeper.
Spiegelhalter does not shy away from discussions of subtle statistical issues such as the nature of different types of uncertainty. So, as he warns at the beginning of chapter 9, where the rubber of mathematical probability theory hits the road of statistical inference, some material will prove challenging even to scientifically sophisticated readers. Some passages require pencil, paper and a few passes through to fully digest, but the approachable big-picture explanations and end-of-chapter summaries help, as does the glossary.
A useful coda focuses on the many dubious statistical practices that have helped to create today’s replication crisis across swathes of science. Spiegelhalter touches, too, on how both scientists and journalists can improve public understanding by running better studies and reporting on them responsibly. After wading into the statistical depths earlier in the book, readers can start using these tangible, easily applicable lessons immediately.
The Art of Statistics will serve students well. And it will be a boon for journalists eager to use statistics responsibly — along with anyone who wants to approach research and its reportage with healthy scepticism.
Nature 567, 458-459 (2019)