To the editor:

Collins argues for a large population-based prospective cohort study in the US to assess the role of genes and environment in common diseases1. Without such a study, he maintains that the promise of genomic research for improving population health will remain out of reach. This study is worthy of serious consideration but will be expensive, take years to implement and not guarantee the desired benefit of translating human genome discoveries into population health benefits. Here, I contend that what is urgently needed is a coordinated global initiative to carry out and synthesize human genome epidemiologic research worldwide. I discuss three needs driving this initiative and argue that this effort could accelerate translation of human genome discoveries into population health benefits.

First, we need global collaboration in population genomic cohort studies. Advances in genomics have inspired the development of large longitudinal studies, of entire populations, to establish repositories of biological materials (e.g., UK Biobank and Iceland)2. Collaboration across these cohort studies is crucial to allow validation of initial findings by minimizing false alarms and to increase statistical power to detect gene-environment interactions, especially for rarer health outcomes. Because of the expected large number of false positive associations (type I errors) between health outcomes and genetic variants, hypothesis testing across sites will have to be accomplished as part of validation of results from hypothesis-generating studies.

The problem of type II errors or poor statistical power is even more challenging. Consider for a moment the staggering implication of the interactions of numerous gene variants and their products. Let us assume that for a common disease only ten genes contribute a substantial population attributable fraction. Even if variation at each locus can be classified dichotomously (susceptible versus nonsusceptible genotype), this will create 210 (>1,000) possible strata. Classification based on just 20 genes will produce more than one million strata. This is methodologically challenging, especially considering interactions of these genes with other genes and environmental factors2. No single cohort study, no matter how large, will have adequate power to detect gene-environment interaction for numerous gene variants, especially for rarer health outcomes. Appropriate pooled analyses will increase the chance of finding true associations of relevance to public health. The full potential of cohort studies to shed light on the occurrence of complex diseases will probably be realized only by pooling and synthesis across multiple populations with different genetic, environmental and sociocultural factors. Integrating data across studies will require developing approaches for facilitating pooled analyses and synthesis. We are seeing the beginning of such a global movement across international boundaries with the establishment of P3G by Bartha Knoppers and her colleagues (Public Population Project in Genomics; http://www.p3gconsortium.org/index.cfm).

Second, we need systematic integration of all human genome epidemiology studies. To build our knowledge base on human genes and health, we need to carry out different types of epidemiologic studies and synthesize their results. Epidemiologic studies can be cohort, case-control or cross-sectional in nature. The strengths and limitations of each design are well-known3. Cohort studies are often erroneously perceived as inherently superior to case-control studies. Given the large variation in funds and time needed to conduct cohort studies, every effort should be made to conduct case-control studies that are based on a valid population sampling scheme of newly diagnosed cases in well-defined communities and appropriately selected controls. Well-designed population-based incident case-control studies can even be nested in a larger population cohort or population under surveillance4.

To develop a systematic approach to the integration of epidemiologic data on human genes, the Human Genome Epidemiology Network (HuGENet; http://www.cdc.gov/genomics/hugenet/default.htm) was launched in 1998. This network of individuals and organizations continuously assesses the impact of human genome variation on population health. HuGENet develops and applies systematic approaches to build the global knowledge base on genes and diseases. In May 2004, the network has more than 700 collaborators from 40 different countries. Its website featured 26 reviews of specific gene-disease associations. In addition, HuGENet has continuously abstracted epidemiologic articles on human genes in an online searchable database, by gene, outcome and risk factor. Because of the tendency for publication bias, an ongoing serious systematic evaluation is now needed for published and unpublished data.

Third, we need evidence-based processes that use epidemiologic information. The synthesis of epidemiologic and biologic data should lead to an evidence-based process that assesses the value of genomic information in health care and disease prevention. For example, an interaction between factor V Leiden (FVL) and use of oral contraceptives has been documented (joint relative risk of 30; ref. 5). But the absolute risk is relatively low (28 per 10,000 person-years) among women with FVL who use oral contraceptives. Whether it is beneficial to screen women for FVL before prescribing oral contraceptives is unclear. Venous thrombosis is relatively rare, and mortality from venous thrombosis is low in young women6. For healthy women contemplating using oral contraceptives, the risk-benefit equation would not currently favor screening7. This example illustrates that epidemiologic data need to be collected to inform clinical trials and decision-making for health practice. As single cohort studies are carried out around the world, we can begin to synthesize the incomplete epidemiologic knowledge base for use in policy and practice. These reviews will also uncover gaps in our knowledge base that can be filled by new research from ongoing studies.

It is time that we develop a global public health genomics initiative that builds on the currently fragmented efforts of genetic-epidemiologic research around the world. This initiative can be developed through public-private-academic collaborations. In particular, we need to build a robust process that allows data from many biobanks to be integrated through standardized platforms for joint analyses. Also, we need to integrate data obtained from all valid epidemiologic study designs, notably population-based incident case-control studies. Systematic synthesis of epidemiologic data takes time and skills and should be allocated sufficient resources. This proposed initiative can take us a long way towards translating human genome discoveries into population health benefits for citizens of the twenty-first century.