Detecting variations in the phenotypes of biological samples, be they cells or model organisms, typically involves distinguishing visual patterns. What does the cellular localization of a protein look like upon treatment with signaling modulators? What happens to the morphology of a fly's head or the locomotion of a mouse when specific genes are mutated or knocked down?

Manually distinguishing such patterns can be tedious for researchers when the number of samples to be examined is very large. But computers can be taught to distinguish biological patterns too.

Machine learning in biology typically involves training an algorithm—or letting it train itself—to automatically classify samples into phenotypic groups. To do this, the classifier must learn which features in the image data are informative for distinguishing the different phenotypic classes. The image data may be static or dynamic, and they could be of cells or model organisms or any other imaged sample. Phenotypes can be assigned on the basis of prior annotation by an experimenter or can be identified in an unsupervised fashion by the classifier itself. The result is an automated classifier that should accurately distinguish phenotypes of interest in a test sample.

Automated classifiers are particularly apt in the context of large-scale screens, in which tens of thousands of samples, if not more, must be examined. Genome-scale methods to generate mutations or gene knockdowns are readily available. In some systems, comprehensive mutant collections already exist. The bottleneck in large-scale screens lies at the phenotyping stage; it is likely that automated classifiers can help.

A computer distinguishes samples with different phenotypes. Credit: Marina Corral

In a recent example, Lu and colleagues trained classifiers to automatically identify Caenorhabditis elegans with changed synaptic morphology (Nat. Methods 9, 977–980, 2012). By using the classifier in real time as the worms passed through a microfluidic chip, the researchers achieved entirely automated forward genetic screening 100 times faster than would be possible manually.

Quantitative phenotyping, in which subtle phenotypes must be distinguished or many individuals must be monitored to gather sufficient data for statistical analysis, can also benefit from machine learning. Classifiers can be trained to detect behavioral phenotypes (in model organisms, for instance) and used to automatically map the sequence of these events in many hours of video recordings.

Powerful though it may be, machine learning in biology still faces challenges. But as the tools improve for biologists to implement this approach, it is possible that phenotyping will slowly cease to be the bottleneck in biological analysis.