Researchers hope to create a massive, crowdsourced DNA database for genetic studies. Credit: Tek Image/Science Photo Library/Corbis

Geneticists have launched a project to test whether they can study millions of genomes — without collecting a drop of blood or tube of spit themselves.

The project, DNA.LAND, aims to entice people who have already had their genomes analysed by consumer genetics companies to share that data, allowing DNA.LAND geneticists to study the information.

Although some consumer genetic-testing companies share data with researchers, they provide only aggregate information about their customers, not individual genomes. Because the data are not always accompanied by detailed information on patients' health, they are of limited use for drawing links between genes and disease.

“Millions of people have access to their genomes, and many more millions will join them in the near future,” says computational geneticist Yaniv Erlich. He is launching DNA.LAND with fellow geneticist Joseph Pickrell at the New York Genome Center and Columbia University in New York. “Can you get to the point that instead of paying for each study from scratch, we can use the crowd to collect and repurpose this data?” Erlich asks.

Erlich will present the project on 10 October at the annual meeting of the American Society of Human Genetics (ASHG) in Baltimore, Maryland. This is not the first time that he has sought to engage the public to assemble data for large research studies. For instance, Erlich has previously combined data from genealogy websites into the world’s largest family tree, with information on 13 million people.

DNA.Land is an example of the 'participatory turn' in human subjects research, says Michelle Meyer, a bioethicist and legal scholar at the Icahn School of Medicine at Mount Sinai in New York. This is a smart research model, since it keeps sequencing and data-storage costs low and doesn't run into the patchwork of federal and state laws governing genetic testing itself.

DNA donors

Erlich hopes to tap the genomes of up to three million customers of companies such as 23andMe, Ancestry.com and Family Tree DNA. The companies allow people to download a file containing the readout of their genetic results.

By combining these data with other information about the participants, such as that on their health, Erlich hopes to assemble a very large data set. A recent analysis, for instance, suggested that as many as 2 billion genomes could be sequenced by 2025. “The sky is the limit,” he says.

Erlich has studied the potential for unmasking the identities of anonymous donors of genetic data, and the study's consent document warns participants that “we cannot guarantee that your identity and/or data will never become known, which could have significant implications in some scenarios. We estimate that the risk for such a confidentiality breach is low but not zero.” Erlich and Pickrell have adopted what they call a “skin in the game” philosophy by making their own genomes publicly available.

Meyer, who supports the project, says that it might require a more detailed consent briefing — for instance, by spelling out what risks there are if participants' identities are revealed, or suggesting that participants might want to consult with their families before entering information about them. When DNA.LAND participants create an account, for instance, they are asked for the names and dates of birth of their parents.

Read next: Why the 'devious defecator' case is a landmark for US genetic-privacy law

"Usually, genomics studies suggest discussing your decision to participate with close family members," Meyer says. "Here, genomic data is combined with parents' names and dates of birth, both identifiers, so it was surprising that there was no mention of risks to family members."

Erlich says that his team wanted to keep the consent form succinct, and that DNA.LAND provides more detailed information as participants progress to relevant parts of the study. He adds that the team is open to making revisions to its consent documents.

Crowdsourced science

DNA.LAND will not pay participants, but uses other means to entice people to share their data. It will provide shareable digital contribution badges to participants as they contribute more information, and promises to provide new findings about their genomes in return.

For instance, to stitch together a coherent data set from genomes that have been analysed by multiple companies and which each test for different genetic markers, DNA.LAND will use a method called imputation. This allows the project to infer the identities of gene variants that were not originally tested, filling in gaps on the basis of knowledge about specific markers that are often inherited together. Participants will be told about newly identified genetic variants uncovered by imputation. The researchers also have promised to tell participants if the work uncovers that they have relatives in the project database.

Statistical geneticist Gonçalo Abecasis at the University of Michigan, Ann Arbor, has enlisted 7,200 research participants in a separate genetics study called Genes for Good. He says that both his project and DNA.LAND are motivated by geneticists' growing awareness that it will take very large data sets to understand how genes influence human health. That may require finding new ways to recruit research participants — such as the Facebook app created by Genes for Good.

Going viral

Meyer says that the ability to learn more about their own genomes and find new relatives could attract genetic genealogists to the DNA.LAND project. Such people are interested in using DNA to explore their family histories, and are often avid users of genetic-testing services. But attracting other groups of people might be more difficult, she says.

Read next: Giant study poses DNA data-sharing dilemma

“I have no doubt that genomics nerds will flock to the site, but a lot of interesting research they might want to do will require large sample sizes, so they'll need to go beyond the community of early adopters and academics who attend ASHG each year,” Meyer says.

Geneticist and entrepreneur David Mittelman, who is based in Houston, Texas, says that people can be convinced to share their data if it is easy and fun, and if they get something in return. He noted that DNA.LAND's creators understand this — on 5 October, Erlich tweeted that he had used the project’s relative finder to discover that he and human geneticist Nathan Pearson of the New York Genome Center are on the order of third or fourth cousins, prompting a flurry of tweets.

“That kind of engagement and viral promotion is how gaming companies like Zynga spread their games through Facebook and the Internet,” Mittelman says. “That’s the recipe for success: standardize the tools, engage folks and give them a value-add, and if it’s easy to share, they will do it.”