Nature | News

Giant gene banks take on disease

Researchers bring together troves of DNA sequences in the hope of teasing out links between traits and genetic variants.

Article tools

Joe Raedle/Getty

Vast stores of DNA samples and data have been produced by the increasing pace of genetic sequencing.

Early last year, three researchers set out to create one genetic data set to rule them all. The trio wanted to assemble the world’s most comprehensive catalogue of human genetic variation, a single reference database that would be useful to researchers hunting rare disease-causing genetic variants.

Unlike past ‘big data’ projects, which have involved large groups of scientists, this one deliberately kept itself small, deploying just five analysts. Nearly two years in, it has identified about 50 million genetic variants — points at which one person’s DNA differs from another’s — in whole-genome sequence data collected by 23 other research collaborations. The group, called the Haplotype Reference Consortium, will unveil its database in San Diego, California, on 20 October, at the annual meeting of the American Society of Human Genetics.

Geneticists have not always been so willing to share data. But that seems to be changing. “It’s been surprisingly easy to bring all these data sets together,” says Jonathan Marchini, a statistical geneticist at the University of Oxford, UK, and one of the consortium’s leaders. “There is a lot of goodwill between the people in the field; they all understand the benefits of doing this and have worked hard to make their data available.”

In the past five years, there has been an explosion in rates of sequencing human genomes thanks to the falling cost of the technology. At the same time, geneticists have realized that linking genes to diseases and traits will require much bigger sample sizes than any one research centre can assemble.

It was once assumed that common diseases and traits could be traced to a few common genetic variants that would be relatively easy to find. But that has turned out not to be the case. It is now clear that thousands of different variants each play a small part in determining a person’s height or risk of schizophrenia, for example. And finding those thousands of variants means looking at a daunting number of people. At the same time, the increased pace of genetic sequencing has made it possible to collect enough genomes to uncover those variants.

“Here are a bunch of data sets that individually cost millions of dollars to generate, and you have people willing to make that data available to a shared resource, which is amazing,” says geneticist Daniel MacArthur of Massachusetts General Hospital in Boston.

MacArthur is part of the Exome Aggregation Consortium, another attempt to create a supersized library of human genetic variation. On 20 October, MacArthur and his colleagues plan to unveil their own public database containing the protein-coding portions, or exomes, of 63,000 human genomes originally gathered by other researchers. “We can say from looking at a very large cohort of people … this is what the distribution of rare variation looks like,” says MacArthur. “And that is very powerful.”

MacArthur is developing tools to comb the data for mutations that disable genes. Only some of these ‘loss-of-function’ mutations cause harm; predicting which are pathogenic will require knowing more about which ones regularly occur in healthy people.

Some studies are already reaping the benefits of huge data sets. On 5 October researchers published a paper on the genetics of height that included data on more than 250,000 people (A. R. Wood et al. Nature Genet. http://doi.org/v6k; 2014). The data had been gathered in separate genome-wide association studies, which look for correlations between genetic variants and traits or diseases, and pooled as part of the Genetic Investigation of Anthropometric Traits (GIANT) Consortium. The paper reported 697 new variants linked to height, more than tripling the previous count. Still, researchers estimate that the hundreds of common variants now identified account for only 16% of the genetic contributors to height.

Throwing even more data into the pool could reveal some of the rest, says Joel Hirschhorn, a geneticist at the Broad Institute in Cambridge, Massachusetts, and a leader of the GIANT consortium.

Journal name:
Nature
Volume:
514,
Pages:
282
Date published:
()
DOI:
doi:10.1038/514282a

For the best commenting experience, please login or register as a user and agree to our Community Guidelines. You will be re-directed back to this page where you will see comments updating in real-time and have the ability to recommend comments to other users.

Comments for this thread are now closed.

Comments

Comments Subscribe to comments

There are currently no comments.

sign up to Nature briefing

What matters in science — and why — free in your inbox every weekday.

Sign up

Listen

new-pod-red

Nature Podcast

Our award-winning show features highlights from the week's edition of Nature, interviews with the people behind the science, and in-depth commentary and analysis from journalists around the world.