Hobbyists add depth to ancestry trawls.
Hours after Joseph Pickrell put his genome on the internet, an anonymous blogger took the data and concluded that he came from Ashkenazi Jewish stock. Pickrell, a genetics graduate student at the University of Chicago, Illinois, was sceptical about the claim. But after talking to relatives, he discovered that he had a Jewish great-grandfather who had moved to the United States from Poland at the turn of the nineteenth century. "It was a part of my ancestry I was totally unaware of," he says.
“They are not amateurs. They are far from being amateurs. , ”
The blogger, who writes under the pseudonym Dienekes Pontikos at http://dodecad.blogspot.com, had commandeered Pickrell's DNA as part of the Dodecad Ancestry Project, an ambitious project in which cutting-edge genomic analysis meets Web 2.0. Pontikos analyses genetic data submitted by followers of his blog to reconstruct personal ancestry and human population history — and reports his findings online. He is part of a small but growing group of 'genome bloggers', a mix of professional scientists and hobbyists proving that widely available tools for computational biology could enable recreational bioinformaticians to make new discoveries.
"They are not amateurs. They are far from being amateurs," says Doron Behar, a population geneticist at Rambam Health Care Campus in Haifa, Israel, who studies human history. "I cannot stress enough the level of appreciation I have for their efforts."
Pontikos has so far analysed several hundred thousand single-letter DNA variations from more than 2,200 individuals. That includes more than 200 submitted to him by readers of his blog, who had had their genomes analysed by genetics testing firms such as 23AndMe, based in Mountain View, California, with the remainder coming from publicly available datasets. The readers volunteering their genomes (identities stay private) are mostly keen to delve into their own ancestry. But Pontikos, who is from Greece and describes himself as an "anthropology dilettante", is more interested in unfurling the history of populations that tend to be overlooked by human-population geneticists. For instance, his analysis of genomes from people living in northern Eurasia reveals a genetic connection between populations in northern Finland and central Siberia (see 'Meet the ancestors').
David Wesolowski, a 31-year-old Australian who runs the Eurogenes ancestry project (http://bga101.blogspot.com), also focuses on understudied populations. "It's a response, in a way, to the lack of formal work that's been done in certain areas, so we're doing it ourselves," he says. Wesolowski and a colleague have drilled into the population history of people living in Iran and eastern Turkey who identify as descendants of ancient Assyrians, and who sent their DNA for analysis. Preliminary findings suggest their ancestors may have once mixed with local Jewish populations, and Wesolowski plans to submit these results to a peer-reviewed journal.
But Pontikos sees little point in formally publishing his findings. "I can bypass them entirely, and have the entire world review what I write," he wrote in an e-mail. Indeed, comments on his blog — "could you please provide the eigenvalues for the principal component analysis", for instance — read like the niggling recommendations of a manuscript reviewer.
Pickrell notes that Dodecad and Eurogenes use cutting-edge techniques and open-source software developed by geneticists studying population history. The methods — which involve modelling past mixing between populations and distilling vast quantities of genotype data — still stir debate in the peer-reviewed literature because they can be difficult to interpret unambiguously, says John Novembre, a population geneticist at the University of California, Los Angeles. Behar, whose data on Jewish ancestry have been used by both projects, cautions that the techniques are more robust when applied to the history of an ethnic group, rather than the ancestry of an individual.
In response to concerns about the genetic privacy of those offering their genomes for analysis, "I don't think this is too worrisome," says Hank Greely, director of the Center for Law and the Biosciences at Stanford University in California. Both projects provide adequate privacy protection, he says, although they both could do a slightly better job at disclosing the risk of a release.
Although people may be happy to part with their genomes to learn more about their ancestry, the genetic and trait data needed for biomedical applications are much harder, if not impossible, for amateurs to come by. Public repositories, such as the US National Institutes of Health's database of Genotypes and Phenotypes, tightly restrict access.
One effort to change that is the Personal Genome Project, which is spearheaded by George Church, a geneticist at Harvard Medical School in Boston. The project aims to make the complete genome sequences and traits of 100,000 people freely available to anyone, with no strings attached. So far it has enrolled 1,000 participants and published near-complete genomes for 10 of them. Pickrell and 11 other scientists and genomics experts also added to the trove of freely available genomic data recently when they released their genetic data as part of a project called Genomes Unzipped.
Church argues that better access to high-quality data could help this kind of informal bioinformatics to flourish, enabling computer-savvy people to make important contributions to genomics, just as they have with online businesses such as Facebook. "It didn't take that much training to become a social-networking entrepreneur. You just had to be a good coder," he says. With bioinformatics, "I think we're in a similar position."