Informatics: Make sense of health data

Elliott, Julian H.; Grimshaw, Jeremy; Altman, Russ; Bero, Lisa; Goodman, Steven N.; Henry, David; Macleod, Malcolm; Tovey, David; Tugwell, Peter; White, Howard; Sim, Ida

doi:10.1038/527031a

Download PDF

Comment
Published: 04 November 2015

Informatics: Make sense of health data

Julian H. Elliott¹,
Jeremy Grimshaw²,
Russ Altman³,
Lisa Bero⁴,
Steven N. Goodman⁵,
David Henry⁶,
Malcolm Macleod⁷,
David Tovey⁸,
Peter Tugwell⁹,
Howard White¹⁰ &
…
Ida Sim¹¹

Nature volume 527, pages 31–32 (2015)Cite this article

525 Accesses
18 Citations
416 Altmetric
Metrics details

Subjects

Develop the science of data synthesis to join up the myriad varieties of health information, insist Julian H. Elliott, Jeremy Grimshaw and colleagues.

Credit: Illustration by David Parkins

If you are wondering whether exposure to some chemical could increase your chances of getting colon cancer, you could easily find supportive evidence from animal experiments. You might then discover that epidemiological studies tell a different story.

There have never been more options when it comes to measuring factors relevant to health. We can sequence our entire genomes and those of our bacteria, viruses and tumours. In principle, every visit to the doctor can be tracked from electronic medical records. Information on physiology, behaviours, diets, movements and interactions with others can be extracted from wearable devices, smartphone apps and social-networking sites¹. And thanks to the open-access movement and a shift in data-sharing norms, more data are being made publicly available.

Yet sifting through the information to find answers to questions about health is becoming increasingly difficult, even for the experts. The data exist in disparate domains, are generated using different methods, and are stored in different infrastructures — from the private servers of hospitals to global platforms, such as dbGaP, an open database of genotypes and clinical information.

Pooling data

We believe that to consolidate data from different sources into comprehensive and coherent bodies of evidence on which decision-makers can act, researchers need to better exploit current methods and tools for data synthesis — and to develop superior ones.

Researchers usually try to obtain insights by pooling the same kind of data, such as from clinical trials. But because different study and data types tend to have distinct strengths and weaknesses, a much richer understanding can emerge when different kinds of information are combined.

The drug cisapride, for instance, was licensed in the United States in 1993 to treat heartburn, on the basis of data collected in clinical trials over ten years. Yet the drug's association with fatal heart-rhythm disturbances² was understood only when data from clinical trials were consolidated with those from large, long-term cohort studies, which recorded cisapride's effects in thousands of people.

Likewise, the picture obtained from conventional influenza surveillance (which involves collecting data from primary-care clinics) can lag behind what is actually happening on the ground. Google collects real-time information based on the use of search terms related to flu symptoms, but these findings can be inaccurate. The best insights almost certainly come from aggregating these different data types³.

Credit: Illustration by David Parkins

So how can we bring together the multiple, extremely diverse data sets that are now becoming available?

Formal methods for 'evidence synthesis' — in which multiple sources of data are combined to obtain new insights — were first developed in the social sciences in the 1970s. The techniques have since been adapted in many branches of science, and they underpin high-impact decision-making, for example in drug licensing⁴. They generally involve identifying and collating all the available and relevant data; assessing each data source's strengths and vulnerability to bias; and deciding how to handle the different sources of data depending on their rigour and the question being asked (some data may be excluded, for instance). Then, if appropriate, a meta-analysis or qualitative assessment can be conducted, incorporating the information⁵.

For example, a UK group combined⁶ data from clinical trials with those from cohort studies in a meta-analysis to assess the effectiveness of anti-D, a drug given to some pregnant women to prevent them from producing antibodies against their babies. In this case, potential sources of bias, such as different clinics providing care for the women in cohort studies, were systematically identified, and their impact was minimized.

Yet many researchers immersed in the combination and analysis of large data sets that are vulnerable to spurious correlations, such as genomic or electronic-medical-record data, are unaware of evidence-synthesis tools and their potential usefulness. Conversely, many experts in evidence synthesis are unfamiliar with the methods often used to analyse large data sets relevant to health.

We believe that the core elements of evidence synthesis must be combined with other data sciences to develop new ways to make sense of diverse data.

Managing bias

Scientists need to work out why, when and how to combine diverse data — for instance, should physical-activity data from clinical records, online questionnaires and wearable devices be combined? As well as addressing when and how to combine diverse individual-level data, scientists need to grasp the risks of bias associated with each data type and incorporate such risks into their analyses. For clinical trials and observational studies of the effects of interventions, analysts can use the Cochrane Risk of Bias approach. Similar methods are needed to enable the detection and reduction of bias in other data types, such as social-networking and mobile-phone data.

Also needed are agreed ways to capture and represent information on potential sources of bias. Organizations investing in infrastructure and standards for health data, such as Health-Level 7, need to incorporate this layer of metadata (data about data) into their systems.

Society does not need more islands of data analysis.

Methods to deal with bias must be incorporated into new analytical systems developed to guide decision-making in health care — including those based on natural-language processing and machine learning. Transparent and independent evaluations of these new systems will also be important, although challenging to achieve for proprietary systems such as IBM Watson.

In the short to medium term, conferences, funding programmes and a restructuring of departments in universities and institutes will be crucial to support collaborations between computational biologists, computer scientists, clinical and population-health researchers and specialists in evidence synthesis. For instance, major granting agencies should invest in dedicated research-methods programmes similar to that of the UK National Institute for Health Research. Targeted investment will also be needed to develop data infrastructure in poor regions and countries. In the long term, a new type of analyst, adept at appraising and combining diverse data types appropriately, may emerge.

Joining the dots

What could these shifts mean in practice? One of the aims of the US Precision Medicine Initiative (PMI) is to prevent people from getting cancer. This means understanding the effects of myriad genomic, behavioural and environmental factors and their interactions. The value of the initiative will be enhanced if data from these very different domains can be combined appropriately and easily.

Another aim of the initiative is to develop new cancer therapies. Better systems for data synthesis would inform drug development with richer and more accurate insights from the 'omics' sciences, animal studies and early human trials. Moreover, health-care funders such as Britain's National Health Service and Medicare in the United States could better understand a drug's benefits and harms in the real world by synthesizing data from clinical trials, cohort studies, patient experiences reported through mobile and social applications, and drug-surveillance systems. (These include the US Sentinel Initiative and the Canadian Network for Observational Drug Effect Studies, which pool data from different health-care systems to monitor the adverse effects of licensed drugs.)

We are not proposing a one-model-fits-all approach. But society does not need more islands of data analysis that support conflicting inferences. As large and diverse data sets become ever more plentiful, we must ensure that rigorous and trustworthy methods to make sense of the data are developed in parallel.

References

Weber, G. M., Mandl, K. D. & Kohane, I. S. J. Am. Med. Assoc. 311, 2479–2480 (2014).
CAS Google Scholar
Wysowski, D. K. & Bacsanyi, J. N. Engl. J. Med. 335, 290–291 (1996).
Article CAS Google Scholar
Lazer, D., Kennedy, R., King, G. & Vespignani, A. Science 343, 1203–1205 (2014).
Article ADS CAS Google Scholar
Institute of Medicine. Finding What Works in Health Care: Standards for Systematic Reviews (National Academies Press, 2011).
Chalmers, I. Ann. Am. Acad. Pol. Soc. Sci. 589, 22–40 (2003).
Article Google Scholar
Turner, R. M. et al. PLoS ONE 7, e30711 (2012).
Article ADS CAS Google Scholar

Download references

Author information

Authors and Affiliations

Julian H. Elliott is senior research fellow at the Australasian Cochrane Centre at Monash University, and head of clinical research in the Infectious Diseases Unit at Alfred Hospital, Melbourne, Australia.,
Julian H. Elliott
Jeremy Grimshaw is senior scientist at Ottawa Hospital Research Institute and professor of medicine at the University of Ottawa, Canada.,
Jeremy Grimshaw
Russ Altman is professor of bioengineering and genetics at Stanford University, California, USA.,
Russ Altman
Lisa Bero is professor at the Charles Perkins Centre and Faculty of Pharmacy, University of Sydney, Australia.,
Lisa Bero
Steven N. Goodman is co-director of the Meta-Research Innovation Center (METRICS) and associate dean for clinical and translational research, Stanford University, California, USA.,
Steven N. Goodman
David Henry is professor of health systems data at the Dalla Lana School of Public Health, University of Toronto, Canada, and scientist at the Institute for Clinical Evaluative Sciences, Toronto, Canada.,
David Henry
Malcolm Macleod is professor of neurology and translational neurosciences at the University of Edinburgh, UK.,
Malcolm Macleod
David Tovey is editor-in-chief and deputy chief executive of Cochrane, London, UK.,
David Tovey
Peter Tugwell is director of the Centre for Global Health at the Institute of Population Health, University of Ottawa, Canada.,
Peter Tugwell
Howard White is chief executive of the Campbell Collaboration, Oslo, Norway.,
Howard White
Ida Sim is professor of medicine at the University of California, San Francisco, USA.,
Ida Sim

Authors

Julian H. Elliott
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Grimshaw
View author publications
You can also search for this author in PubMed Google Scholar
Russ Altman
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Bero
View author publications
You can also search for this author in PubMed Google Scholar
Steven N. Goodman
View author publications
You can also search for this author in PubMed Google Scholar
David Henry
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm Macleod
View author publications
You can also search for this author in PubMed Google Scholar
David Tovey
View author publications
You can also search for this author in PubMed Google Scholar
Peter Tugwell
View author publications
You can also search for this author in PubMed Google Scholar
Howard White
View author publications
You can also search for this author in PubMed Google Scholar
Ida Sim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian H. Elliott.

Ethics declarations

Competing interests

D.T. is an employee of, and J.H.E., J.G., P.T. and L.B. are contributors to, Cochrane, an independent, non-profit organization engaged in the development of health evidence for decision-making.

Additional information

Tweet Follow @NatureNews

Published: 04 November 2015
Issue Date: 05 November 2015
DOI: https://doi.org/10.1038/527031a

This article is cited by

Integration of Philosophy of Science in Biomedical Data Science Education to Foster Better Scientific Practice
- Annelies Pieterman-Bos
- Marc H. W. van Mil
Science & Education (2023)
Selecting, refining and identifying priority Cochrane Reviews in health communication and participation in partnership with consumers and other stakeholders
- Anneliese J. Synnot
- Allison Tong
- Sophie J. Hill
Health Research Policy and Systems (2019)
Real-World-Evidence-Forschung auf Basis von Big Data
- Benedikt E. Maissenhälter
- Ashley L. Woolmore
- Peter M. Schlag
Der Onkologe (2018)
Real-world evidence research based on big data
- Benedikt E. Maissenhaelter
- Ashley L. Woolmore
- Peter M. Schlag
Der Onkologe (2018)
Detecting and correcting the binding-affinity bias in ChIP-seq data using inter-species information
- Martin Nettling
- Hendrik Treutler
- Ivo Grosse
BMC Genomics (2016)