The most common and pervasive human health problems are caused by diseases with complex etiologies. Humans differ greatly in their genetic vulnerability to these common diseases. Mechanisms that underlie disease susceptibility and progression are, with few exceptions, influenced by numerous genetic, developmental and environmental factors. Progress toward treatment and prevention will require new approaches to the genetic analysis of complex systems1. We believe that computational, statistical and genomic resources are now sufficiently mature to address these mechanisms in the context of an experimental model system that more accurately reflects the genetic structure of human populations2,3. Global analysis of complex biological systems can be implemented most efficiently using experimental designs that employ multifactorial perturbations4,5. A well-designed model-organism resource for a new synthetic phase of genetic studies will have a direct, positive and long-lasting impact on the diagnosis and treatment of common and chronic human disease, including cancer, pulmonary and cardiovascular diseases, obesity and diabetes, behavioral disorders and neurodegenerative diseases.

The Complex Trait Consortium will provide the research community with unique resources needed to discover and dissect the exact contributions of genetic, environmental and developmental components to the etiology of common, complex human diseases. Here we propose a new community resource called the Collaborative Cross that is designed to support the development of integrated models of complex traits. A number of mouse resources for the genetic dissection of disease-related traits already exist, and new ones are being developed. The Collaborative Cross is not a competing resource but rather a complementary approach to ongoing efforts. It will provide a unique and powerful mechanism for functional studies of biological networks that will be essential for understanding the intricacies of disease processes.

What is the Collaborative Cross?

The Collaborative Cross is a large panel of recombinant inbred (RI) strains derived from a genetically diverse set of founder strains and designed specifically for complex trait analysis (R.W.W. et al., unpublished data). It is a community effort that will generate a comprehensive body of genetic and physiological data derived from a common, reproducible and stable population. This is in contrast to current efforts exploiting the mouse as a model organism that are based on isolated and transient crosses. By providing a large, common set of genetically defined mice, the Collaborative Cross will become a focal point for cumulative and integrated data collection (Fig. 1), giving rise to a new view of the mammalian organism as a whole and interconnected system.

Figure 1
figure 1

The Collaborative Cross as an integrating mechanism for multiple, diverse phenotypic assays.

RI mouse strains are an ideal genetic resource. They can produce unlimited numbers of genetically identical mice that can be exposed to various experimental perturbations and interventions. Similar resources have been or are being developed in maize, Arabidopsis thaliana and Drosophila melanogaster6. The reproducibility of RI strains is essential to investigating traits with low heritability or the effects of multiple environments7. Many traits important to human health, such as phenotypes related to aging, cancer and behavior, cannot be assayed using single mice. The RI strains will be genotyped only once, and this initial investment will be recovered with every new experiment. A large set of RI strains could support studies that incorporate multiple genetic, environmental and developmental variables into comprehensive statistical models of disease susceptibility, pathophysiology and progression.

Inbred mouse strains have a large amount of natural variation, and it has been suggested that the variation among common inbred strains obviates the need for the large RI set envisioned here. But several factors limit our ability to exploit existing inbred strains as a reference panel. First, the number of readily available strains is too small to provide reliable statistical support for most genetic effects and phenotypic correlations. Second, the genetics of these mice is confounded by a complex and uncertain history that prevents inference of causation. In contrast, controlled randomization of genetic factors, essential to causal inference, will be achieved in the construction of the Collaborative Cross. Genetic background is a greater source of nonlethal phenotypic diversity than mutagenesis or knockout transgenics. The common occurrence of transgressive segregation and new traits, seen only in the progeny of crosses, indicates that a vast potential for phenotypic diversity is hidden away in the common inbred strains and that this diversity can be expressed when these genomes are mixed in many different combinations. The Collaborative Cross will bring this hidden diversity to the fore and make it available to experimental manipulation.

How will it be implemented?

The Collaborative Cross is a panel of eight-way RI strains that are independently derived from a mating scheme that minimizes unpredictable genomic interactions between strains and optimizes the contribution from each parental strain. The breeding design is efficient and modular: the genomes of eight founder strains are rapidly combined and are then inbred to produce finished RI strains (Fig. 2). Eight-way RI strains achieve 99% inbreeding by generation 23. Each Collaborative Cross strain (Fig. 3) will capture 135 unique recombination events (K.W.B., unpublished data). With genetic contributions from multiple parental strains, including several wild derivatives, the Collaborative Cross will capture an abundance of genetic diversity and will retain segregating polymorphisms every 100–200 bp. This level of genetic diversity will be sufficient to drive phenotypic diversity in almost any trait of interest. Generating a large number of such strains will guarantee high mapping resolution and sample sizes sufficient for detecting extended networks of epistatic and gene-environment interactions.

Figure 2: The eight-way 'funnel' breeding scheme for generating RI strains.
figure 2

G, generation.

Figure 3: A typical eight-way RI strain.
figure 3

Color scheme indicates the parental origin of genomic segments.

We estimate that 1,000 strains will be required. This estimate is based on the statistical power necessary to detect biologically relevant correlations among thousands of measured traits. A set of 1,000 strains containing 135,000 recombination events is a far more powerful and flexible research tool than a set of 100 strains containing the same total number of recombination events. Epistatic and environmental interactions also influence this estimate. Mounting evidence suggests that gene-gene interactions are crucial in many complex disease etiologies8,9,10. A set of 1,000 strains will readily support simultaneous mapping of many two-way and three-way epistatic interactions.

So that this resource may be fully used, strains of mice, tissues, genotypes and community-acquired phenotypes will be available without restriction at a realistic cost to investigators. Adequate cost recovery will be assured. In this regard, not all applications will require phenotyping the entire set of 1,000 strains. Simulation studies suggest that considerable efficiencies can be obtained with a two-stage mapping strategy (W.V., unpublished data). For example, an initial mapping panel of 100 strains, selected to have an optimal allele distribution, can be used as a foundation panel to obtain rough localization of quantitative trait loci. Second-stage mapping takes further advantage of the full panel by using an additional selected subset of 100 strains containing informative recombination events and allelic combinations. This two-stage strategy can achieve mapping resolution that is much higher than could be achieved with a fixed set of 200 strains, provided that the selected strain sets come from a sufficiently large panel11.

In addition to the RI strains themselves, it will be possible to generate an essentially unlimited combinatorial diversity of F1 progeny of the RI strains12 (RIX). These RIX mice will have reproducible genotypes with low formal inbreeding coefficients, more representative of the genetic state of human populations. A set of 1,000 RI strains can generate as many as one million distinct but genetically well-defined and reproducible mice that will represent a vast resource for the discovery of new animal models of human diseases. RIX mice are not the 'unusual' fully inbred creatures that mouse geneticists commonly study. With levels of heterozygosity and admixture comparable to those of human populations, RIX mice are genetically normal mammals that can be replicated ad infinitum.

The effort required to fully genotype the Collaborative Cross will be undertaken only once. All recombination events will be located as precisely as possible given the availability of markers. Each of the eight parental strains will be sequenced, effectively providing complete genomic sequence information for all 1,000 strains and for one million potential RIX progeny. The availability of mice from a fully genotyped panel will greatly reduce the barrier to entry for new studies, particularly for nongeneticists.

What kind of science will it enable?

Access to a large and diverse panel of RI strains will yield precisely mapped candidate regions, circumventing the most laborious steps of analyzing quantitative trait loci. But the greatly enriched analysis of gene pleiotropy and molecular networks that will become possible only with access to a large reference panel of RI strains is of even greater importance. The completely characterized genetic structure of the RI strains will enable us to establish networks of functionally important relationships within and among diverse sets of physiological and behavioral phenotypes13,14.

Gene-environment interaction is a crucial problem in genetics, and a key contributor to most common diseases, which has been difficult to study in any mammalian population. Gene-pathogen interactions are also of enormous importance to human health. Analysis of gene-environment interaction requires the use of isogenic strains that can be studied in large numbers in different environments and usually over a period of years15. Mouse experimental geneticists usually have adequate control over many environmental factors but have not had sufficiently large isogenic mapping panels. The Collaborative Cross solves this problem by providing a large collection of isogenic strains that can be assayed under different sets of environmental conditions.

Predictive genetics is a form of individualized medicine in which genetic information is used to predict the future health of an individual. With the combinatorial diversity available in the Collaborative Cross and the potential RIX genomes, it will be possible to validate statistical models relating genotypes and environments to phenotypic outcomes. Software tools developed to predict phenotypes of RIX mice will become the prototypes for applications to human health. The application of predictive genetics to humans will obviously involve greater uncertainties16, but the principles must be developed in a mammalian model system with sufficient multilocus genetic diversity to accurately represent the human genetic state. The Collaborative Cross will provide this system.

What will it cost?

At current estimates, generating a finished set of 1,000 Collaborative Cross strains will cost $3.5 million per year over the 8-year breeding period (R.W.W. and G.A.C., unpublished data). This cost estimate includes genotyping for quality control, mouse tracking and related informatics, professional mouse care technicians, and a limited amount of phenotyping. Many extra mice will be generated during the breeding phase. These mice could be genotyped and provided to investigators at a modest fee, generating substantial data during the breeding phase.

Once breeding is complete, we will genotype and cryopreserve the finished strains. This one-time expenditure might be as much as $3 million in 2004 dollars but will certainly be much lower in 2012. Cryopreservation will ensure a small number of recoveries for each strain. The costs of breeding and cryopreservation will be streamlined, and no heroic efforts will be made to retain uncooperative strains.

Maintenance and distribution of the Collaborative Cross will require careful advanced planning and a long-term commitment from one or more breeding centers. Current practice for large mouse RI panels is to maintain each strain in eight cages. At this level, the cost of housing 1,000 strains will be roughly $1.5 million per year, and each strain will produce 120 progeny per year. If we add a $2 surcharge per mouse for subdermal identification tags and $4 for shipping, the final cost of each mouse is less than $20 (in 2004 dollars).

Phenotyping costs will represent a large portion of the total expenditure required to use the Collaborative Cross. Phenotyping centers, many established for mutagenesis projects, are already actively testing thousands of mice each year. Ideally, one or more such centers will collect, curate and distribute large amounts of phenotype data. Acquisition of high-quality data on the transcriptomes, proteomes and metabolomes of key organ systems will motivate the adoption of the Collaborative Cross. High-throughput profiling methods will be much more affordable by the time the RI strains are ready for use and distribution. It will be possible to investigate complex traits with extensive data on transcripts and proteins in several organ systems, greatly augmenting the development of molecular networks.

How many groups of scientists can we expect to use the Collaborative Cross? Interest in the cross will increase as we generate phenotypes, complete genotypes and knowledge of disease susceptibilities among the Collaborative Cross strains. Based on current levels of interest, we estimate that by 2012, more than 100 groups of researchers would be ready, willing and able to use the Collaborative Cross. With 100 subscribing investigators each purchasing 1,200 mice per year, the Collaborative Cross could be self-supporting and would cost less to maintain than an equivalent production of intercross mice.

Use of strains of mice, tissues, genotypes and community-acquired phenotypes must be available without restriction but with adequate cost recovery and subject to availability. Access to mouse facilities and resource populations maintained at distribution research centers could provide visiting researchers access to mice and equipment for phenotyping, eliminating the requirement to have large on-site animal facilities. A visitor program at one of several distribution centers would open mammalian genetic analysis to many researchers that previously could not contemplate this scale of science because of a lack of local resources.

Cost is crucial. These rough assessments indicate that the Collaborative Cross will be economically viable. The more use it gets, the more valuable it will become.

Conclusions

Now is the time to begin the development of the resources that will be required to fully exploit the genetic power of the mouse. With the best current breeding practices, it will take 7–8 years to derive new mouse strains. Once developed, however, each strain represents an eternal resource that can be used repeatedly to accumulate data. The integration of data, information and knowledge about multiple, diverse phenotypes is the defining feature of the Collaborative Cross. Of course, this approach requires a large number of genomes, each with its own unique combinations of alleles. With a set of 1,000 fully genotyped RI strains and more than one million potential isogenic and completely defined F1 hybrids, we will be able to model human populations and disease processes better. A fixed set of genomes, interrogated by a community of researchers, will enable an enormous and systematic accumulation of data on the complex interplay of genes and environments that will support a previously untenable unifying theory of mammalian biology.