Merging and emerging cohorts: Necessary but not sufficient

Collins, Francis S.; Manolio, Teri A.

doi:10.1038/445259a

Download PDF

Commentary
Published: 17 January 2007

Merging and emerging cohorts: Necessary but not sufficient

Francis S. Collins¹ &
Teri A. Manolio¹

Nature volume 445, page 259 (2007)Cite this article

1088 Accesses
57 Citations
8 Altmetric
Metrics details

The proposal advocated in the preceding Commentary by Willett et al.¹, namely to extend existing cohort studies rather than start a new large-scale prospective study from scratch, has many merits. Indeed, a US National Institutes of Health (NIH) study group that assessed the pros and cons of various models in 2004 considered this option in some depth, and their report² made many of the same points.

Certainly, assembling existing cohorts into a large consortium would provide a powerful resource for investigating genetic and environmental factors in health and disease. The argument that this method is likely to be less costly than a new cohort, and would yield results more quickly, carry considerable weight. But Willett et al. do not address all of the suboptimal aspects of this approach. Those should be clearly noted, lest expectations of such a consortium exceed what it is likely to deliver.

First, there is the issue of standardization. Phenotypic measures used by the existing cohorts, although standardized within cohorts, have not followed uniform procedures across studies, and so there will be significant challenges to merging data from different studies in a valid way. Moreover, key environmental exposures or risk factors will almost certainly differ systematically across cohorts. Combining studies that were focused on specific population subgroups will therefore introduce biases that can be corrected only by limiting the analysis to the lowest common denominator of valid, unbiased exposures.

Second, the reliance on legacy studies fails to take advantage of new tools for measuring dietary intake, physical activity and environmental exposures (as are now being supported through the NIH Genes and Environment Initiative³), because many of these measurements — such as precise ambulatory data — cannot be made on stored biospecimens.

Third, representation has been a major concern driving the national cohort proposal. Despite recent attempts to improve representation of minorities and socioeconomically disadvantaged participants in newer cohorts, the proportions are still far below their representation in the US population. There is also substantial under-representation among men and participants from the south, as well as those with lower levels of education, although these might be addressed somewhat by statistical adjustments.

Fourth, under-representation of people younger than the age of 50 is substantial in these existing cohorts (see Figure) and will only get worse with time. If we wish to address complex disease risk across lifespan, we need to study diseases developing in adolescence and young adulthood, such as asthma, autoimmune disease and major psychoses. Even the investigation of mid-life diseases would be limited by lack of stored biospecimens in earlier life.

Finally, the full value of a large-scale cohort study will depend on free and open access to the data by all qualified investigators. This may be difficult to achieve with a combination of existing cohorts, given the expectations of current investigators about control of the data, and consent limitations by existing study participants.

More ways than one

There is no question that a new cohort study would require many years to implement and to generate results, although useful findings would be available on more common diseases within five years of cohort recruitment⁴. We agree with Willett et al., therefore, that it is reasonable in the interim to seek ways to form consortia of existing studies. But these two models need not be thought of as mutually exclusive.

We must also recognize that environmental exposures (including emerging infections) and preventive or therapeutic interventions will probably change dramatically in the next two decades. Limiting our research enterprise to the exclusive study of existing cohorts, especially without collection of new risk information and recruitment of participants under-represented in existing studies, may ultimately jeopardize our ability to address these evolving health risks in an epidemiologically rigorous manner.

Admittedly, this discussion remains hypothetical, because serious budgetary challenges make a new national cohort an unlikely prospect at the present time. Some may wonder whether the United States can afford both an expansion of existing cohorts and a new national cohort. We believe the real question is whether it can afford not to do both, given the enormous and growing healthcare costs of complex diseases. Finding the genetic causes of even one of these diseases could potentially save billions of dollars in medical costs if appropriate preventive interventions can be developed. Despite the fiscal realities, therefore, we must continue to make the case both for a merging of cohorts now, and the founding of a more rigorously designed national cohort in the future when funds are available. Although recognizing the massive uncertainties in budget situations and priorities, we believe that future generations will wonder why we didn't try as hard as possible to get both of these kinds of studies underway.

References

Willett, W. C. et al. Nature 445, 257–258 (2007).
Article ADS CAS Google Scholar
www.genome.gov
NIH Genes and Environment Initiative http://www.gei.nih.gov/
Manolio, T. A., Bailey-Wilson, J. E. & Collins, F. S. Nature Rev. Genet. 7, 812–820 (2006).
Article CAS Google Scholar

Download references

Acknowledgements

We acknowledge A. Guttmacher, E. Harris and L. Rodriguez for their advice.

Author information

Authors and Affiliations

National Human Genome Research Institute, National Institutes of Health, 31 Center Drive, Bethesda, 20892-2152, Maryland, USA
Francis S. Collins & Teri A. Manolio

Authors

Francis S. Collins
View author publications
You can also search for this author in PubMed Google Scholar
Teri A. Manolio
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Collins, F., Manolio, T. Merging and emerging cohorts: Necessary but not sufficient. Nature 445, 259 (2007). https://doi.org/10.1038/445259a

Download citation

Published: 17 January 2007
Issue Date: 18 January 2007
DOI: https://doi.org/10.1038/445259a

This article is cited by

Assessing the effectiveness of the National Comprehensive Cancer Network genetic testing guidelines in identifying African American breast cancer patients with deleterious genetic mutations
- Foluso O. Ademuyiwa
- Patricia Salyer
- Laura J. Bierut
Breast Cancer Research and Treatment (2019)
Design and analysis issues in gene and environment studies
- Chen-yu Liu
- Arnab Maity
- David C Christiani
Environmental Health (2012)
LifeGene—a large prospective population-based study of global relevance
- Catarina Almqvist
- Hans-Olov Adami
- Nancy L. Pedersen
European Journal of Epidemiology (2011)
Toward a better understanding of ADHD: LPHN3 gene variants and the susceptibility to develop ADHD
- Mauricio Arcos-Burgos
- Maximilian Muenke
ADHD Attention Deficit and Hyperactivity Disorders (2010)
Closing the gap between genotype and phenotype
- Peter K Gregersen
Nature Genetics (2009)

Merging and emerging cohorts: Necessary but not sufficient

References

Acknowledgements

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

This article is cited by

Assessing the effectiveness of the National Comprehensive Cancer Network genetic testing guidelines in identifying African American breast cancer patients with deleterious genetic mutations

Design and analysis issues in gene and environment studies

LifeGene—a large prospective population-based study of global relevance

Toward a better understanding of ADHD: LPHN3 gene variants and the susceptibility to develop ADHD

Closing the gap between genotype and phenotype

Search

Quick links

References

Acknowledgements

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Assessing the effectiveness of the National Comprehensive Cancer Network genetic testing guidelines in identifying African American breast cancer patients with deleterious genetic mutations

Design and analysis issues in gene and environment studies

LifeGene—a large prospective population-based study of global relevance

Toward a better understanding of ADHD: LPHN3 gene variants and the susceptibility to develop ADHD

Closing the gap between genotype and phenotype

Search

Quick links