Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

In Retrospect

A decade of shared genomic associations

A paper that analysed genetic variants in 14,000 people to identify disease-associated regions set the standard for collaborative genome-wide association studies and provided methodological advances whose effects are still felt today.

Ten years ago this month, Nature published a landmark study1 that compared the frequencies of hundreds of thousands of common genetic variants (polymorphisms) at single nucleotides in people with and without seven diseases, to look for variants associated with each disease. Such genome-wide association studies (GWAS) provide an agnostic way to identify these variants, unfettered by prevailing — and potentially incorrect — assumptions about which genomic regions are important in disease biology. The study, by the Wellcome Trust Case Control Consortium (WTCCC), set the standard for this field of research, and nearly 3,000 GWAS have since been published.

Before the advent of GWAS, few genetic regions associated with disease had been reliably identified, and some researchers despaired of ever finding reproducible associations for most heritable diseases2. GWAS burst onto the scene in 2005, with the demonstration of a surprising association between the complement factor H gene, which was known for its role in immune regulation, and age-related macular degeneration, a leading cause of blindness3. Since then, GWAS have provided many more unexpected insights. One of the great early surprises of GWAS findings, for example, was that less than 10% of disease associations lie in protein-coding regions of the genome4. Another surprise has been the identification of specific regions associated with multiple, seemingly disparate diseases, such as polymorphisms in the gene CDKN2A/B, which are associated with coronary heart disease, type 2 diabetes and melanoma (the most serious form of skin cancer)5.

What made the WTCCC paper special was its large sample size and its pursuit of seven very different diseases — 2,000 cases each of bipolar disorder, coronary heart disease, Crohn's disease, high blood pressure, rheumatoid arthritis and diabetes types 1 and 2, compared with a shared set of 3,000 controls. In addition, the project involved more than 50 research groups across the United Kingdom. Persuading these groups to work collaboratively, hold their individual publications until the group-wide paper was published, and share their data openly with the scientific community was a masterwork of diplomacy, for which the study's organizers richly deserve commendation and gratitude.

By simultaneously studying diseases with differing aetiologies and genetic contributions, the consortium hoped to gain insight into not only the specific genetic architecture of each disease (the number of contributing genes and the sizes of their effects), but also differences between them. The researchers also aimed to address methodological issues to improve the reproducibility of genetic-association studies. The WTCCC achieved these aims and more, and the work's immediate impact was recognized by the consortium being chosen as Scientific American's research leader of the year6 and the article being lauded as The Lancet's paper of the year7.

The study revealed 24 statistically significant associations between diseases and specific single nucleotide polymorphisms (SNPs). In addition, it identified a host of other signals at lesser significance levels that were subsequently shown to harbour reproducible associations in larger studies. The only disease for which no associations were found was high blood pressure — but this was later explained by the discovery that the genetic architecture of this disease differs from that of the other six diseases analysed, involving many variants that each have a small effect. Such variants are detectable in much larger GWAS, and more than 100 regions associated with high blood pressure have since been identified8.

In terms of methodology, the WTCCC made valuable advances in genotype calling — a method used by researchers to discern which genetic variants each individual has at a particular site (their genotype)9. The authors also developed and disseminated methods for imputing non-genotyped variants, which lie between the SNPs assayed in a given study. By developing new algorithms and methods, they improved researchers' ability to reliably identify genotypes, reduce calling errors and infer with high probability SNPs that had not been assayed, and to combine data sets gained from GWAS that analysed different sets of variants9, increasing the power to detect rare disease-associated variants.

The consortium also demonstrated that using a common set of controls across multiple studies is a robust and efficient approach, and one that the team's members expanded further, using individuals studied for one disease as controls for another. The study revealed a previously unsuspected degree of geographic differentiation across the United Kingdom for 13 SNPs (for example, there was a north-to-south difference in the frequency of a variant in the gene TLR1 that might have a role in leprosy and tuberculosis). Finally, their work demonstrated empirically the power of increasing sample sizes to detect a greater number of disease-associated SNPs, and served as a cogent reminder that even more associations could be expected if studies were performed using samples that were larger still.

And the larger sample sizes came! Inspired by the WTCCC, international consortia rapidly formed to pool data. Sample sizes well into the tens of thousands became routine (Fig. 1), and at least 30 GWAS exceeding 100,000 individuals are now available online (www.ebi.ac.uk/gwas). The first study involving roughly 500,000 participants will soon be released (www.ukbiobank.ac.uk).

Figure 1: Ever-increasing sample sizes for genome-wide association studies (GWAS).
figure1

In 2007, the Wellcome Trust Case Control Consortium published a landmark association study, in which they analysed genetic variants in 14,000 people to look for those associated with seven diseases1. Sample numbers in GWAS have since grown rapidly. This graph shows the cumulative number of GWAS involving 10,000 samples or more published per year, with those involving different sample sizes indicated in different colours. (Data taken from www.ebi.ac.uk/gwas.)

The WTCCC also helped to propel a revolution in data distribution. The study was one of the first GWAS to provide information about each participant's genotype and associated traits for use by the scientific community. Although access to these data were subsequently controlled to ensure participant confidentiality10, the tradition of open data-sharing and collaboration pioneered by the WTCCC has continued.

Where do we go from here? The flood of GWAS continues unabated, despite predictions that it was a transitional technology that would soon be supplanted by techniques to sequence either entire genomes or all protein-coding regions. Such sequencing studies have certainly identified many rare variants and polymorphisms involving more than a single nucleotide (such as inserted or deleted sections of chromosomes), which GWAS have difficulty detecting. But the low cost and straightforward analytics of GWAS seem likely to ensure its longevity.

A crucial gap in the GWAS spectrum remains to be filled, because ancestrally diverse, non-European populations have been appallingly under-studied11. Notable GWAS in these populations include studies of cardiac conduction in African Americans12 and sleep apnoea in Hispanic and Latino Americans13. One of the next steps will be to identify associations in under-studied populations such as those in Africa and Latin America, and in isolated and indigenous peoples such as those in the Arctic, Pacific islands and Americas. Another outstanding opportunity lies in studies of adverse reactions to drug or other treatments, in which effect sizes are often large and may be directly relevant to clinical care14.

Despite the thousands of studies and millions of genomes examined, associations identified by GWAS still explain only a small fraction of the heritability of complex diseases, and the overwhelming majority fall in regions of the genome that have no known function4. These gaps in knowledge are major challenges and must be overcome if we are to develop effective treatments and improve clinical care15. Ten years on, we are clearly on the right path, as set for us by the WTCCC. But, as with everything in science, the more we know, the more we have to learn.Footnote 1

Notes

  1. 1.

    See all news & views

References

  1. 1

    The Wellcome Trust Case Control Consortium. Nature 447, 661–678 (2007).

  2. 2

    Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. Genet. Med. 4, 45–61 (2002).

    CAS  Article  Google Scholar 

  3. 3

    Klein, R. J. et al. Science 308, 385–389 (2005).

    ADS  CAS  Article  Google Scholar 

  4. 4

    Hindorff, L. A. et al. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    ADS  CAS  Article  Google Scholar 

  5. 5

    Manolio, T. A., Brooks, L. D. & Collins, F. C. J. Clin. Invest. 118, 1590–1605 (2008).

    CAS  Article  Google Scholar 

  6. 6

    Mossman, K. Sci. Am. 298, 42 (2008).

    ADS  Google Scholar 

  7. 7

    Summerskill, W. Lancet 371, 370–371 (2008).

    Article  Google Scholar 

  8. 8

    Warren, H. R. et al. Nature Genet. 49, 403–415 (2017).

    CAS  Article  Google Scholar 

  9. 9

    Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. Nature Genet. 39, 906–913 (2007).

    CAS  Article  Google Scholar 

  10. 10

    Homer, N. et al. PLoS Genet. 4, e1000167 (2008).

    Article  Google Scholar 

  11. 11

    Popejoy, A. B. & Fullerton, S. M. Nature 538, 161–164 (2016).

    ADS  CAS  Article  Google Scholar 

  12. 12

    Evans, D. S. et al. Hum. Mol. Genet. 25, 4350–4368 (2016).

    CAS  Article  Google Scholar 

  13. 13

    Cade, B. E. et al. Am. J. Respir. Crit. Care Med. 194, 886–897 (2016).

    Article  Google Scholar 

  14. 14

    Chan, S. L., Jin, S., Loh, M. & Brunham, L. R. Pharmacogenomics 16, 1161–1178 (2015).

    CAS  Article  Google Scholar 

  15. 15

    Price, A. L., Spencer, C. C. A. & Donnelly, P. Proc. R. Soc. B 282, 20151684 (2015).

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Teri A. Manolio.

Related links

Related links

Related links in Nature Research

Genomics: Guilt by association

Genomics: In search of rare human variants

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Manolio, T. A decade of shared genomic associations. Nature 546, 360–361 (2017). https://doi.org/10.1038/546360a

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing