To speed up discoveries of disease biomarkers and treatments, we must work out a cheaper and faster way to process, store and use the huge medical data sets that are rapidly becoming available (see, for example, Nature 506, 144–145; 2014).

By 2015, it is likely that a typical hospital will create 665 terabytes of data a year (for comparison, the web archive of the US Library of Congress contains less than 500 terabytes). This information can be used to study and analyse treatments — for example, for tuberculosis and stroke — and to reduce health-care costs.

To handle such big data effectively (see also Nature 498, 255–260; 2013), we need to adapt classical information-processing tools. One computational challenge is how to manage the huge volume of detailed material as it becomes available, without sacrificing information. Another is that the data mostly represent physiological processes, the characteristics of which change over time.

Current methods are also inadequate for analysing collective information from different sensors, such as multi-dimensional descriptions from electroencephalography or magnetic resonance imaging of interactions between brain regions.