Over the past 50 years or so, many studies of individual genes and proteins have revealed the fundamental aspects of development that we recognize today. But the recent increase of high-throughput screens and the accumulation of gene-expression and protein-interaction datasets indicates that we are not going to study single genes and proteins for much longer. Kristin Gunsalus and colleagues now predict system-level models of molecular machines that function in early development, based on interactome, transcriptome and phenotypic data sets.

But what is a suitable system to develop such predictive models and then evaluate them? The nematode Caenorhabditis elegans has paved the way for high-throughput approaches, because processes involved in early embyogenesis, such as cell division and polarity, are amenable to large-scale functional analysis. The authors generated network graphs in which each node represents an early embryogenesis gene and its product(s), and each edge represents a potential functional connection based on 6,572 binary physical interactions between 3,848 C. elegans proteins, as well as expression profiling and phenotypic similarity.

Using an algorithm, they then identified densely interconnected regions in a portion of the network — 305 nodes joined by 1,036 edges, each supported by two or three types of functional evidence — and generated two types of model representing the higher-level organization networks that underlie early embryogenesis. The first type contains links that are supported by physical interactions and phenotypic correlations and represents molecular complexes that constitute distinct molecular machines within the cell, such as the ribosome, the proteasome and the anaphase-promoting complex. On the other hand, the second type contains few protein interactions but is dominated by edges that are supported by both phenotypic and expression correlations; examples include genes involved in mRNA and protein metabolism, chromosome maintenance and meiosis.

To assess the predictive value of these models, the authors selected ten previously uncharacterized genes and analysed their potential participation in a molecular machine by visualizing their in vivo dynamic subcellular localization using green fluorescence protein (GFP)-tagged proteins. They tested proteins with connections to three different early embryogenesis models of the second type — centrosomal function, cell polarity and a molecular network involved in DNA replication, chromatin architecture and nucleocytoplasmic transport — and generated supporting evidence for the genes as potential new components of these molecular machines.

The authors propose that this integrated network comprising two types of model is a potential reservoir for hundreds of testable predictions about cellular processes in the early embryo. Most importantly, this approach is scalable and could be applied not only to other biological processes, but also to more complex organisms.