A powerful form of machine learning that enables computers to solve perceptual problems such as image and speech recognition is increasingly making an entry into the biological sciences. These deep-learning methods, such as deep artificial neural networks, use multiple processing layers to discover patterns and structure in very large data sets. Each layer learns a concept from the data that subsequent layers build on; the higher the level, the more abstract the concepts that are learned. Deep learning does not depend on prior data processing and automatically extracts features. To use a simple example, a deep neural network tasked with interpreting shapes would learn to recognize simple edges in the first layer and then add recognition of the more complex shapes composed of those edges in subsequent layers. There is no hard and fast rule for how many layers are needed to constitute deep learning, but most experts agree that more than two are required.

Computation that leads from sequence to functional annotation.

Recent examples show the power of deep learning to derive regulatory features in genomes from DNA sequence alone: DeepSEA (Nat. Methods 12, 931–934, 2015) uses genomic sequence as input, trains on chromatin profiles from large consortia such as ENCODE and the Epigenomics Roadmap, and predicts the effect of single-nucleotide variants on regulatory regions such as DNase hypersensitive sites, transcription factor–binding sites and histone marks. Basset (bioRxiv, 10.1101/028399, 2015) uses similar deep neural networks to predict the effect of single-nucleotide polymorphisms on chromatin accessibility. DeepBind (Nat. Biotechnol. 33, 831–838, 2015) finds protein-binding sites on RNA and DNA and predicts the effects of mutations.

Deep learning will be invaluable in the context of big data, as it extracts high-level information from very large volumes of data. As it gains traction in genome analysis, initial challenges such as overfitting due to rare dependencies in the training data and high computational costs are being tackled. Researchers in academic settings as well as in startup companies such as Deep Genomics, launched July 22, 2015, by some of the authors of DeepBind, will increasingly apply deep learning to genome analysis and precision medicine. The goal is to predict the effect of genetic variants—both naturally occurring and introduced by genome editing—on a cell's regulatory landscape and how this in turn affects disease development.