The era of personal genomics is upon us, with advances in technologies such as DNA sequencing and genotyping fuelling the fires. Personal genomics is a story of researchers looking for genetic clues to our most common diseases, of dazzling advances in genetic analysis technology and of lingering questions about how the public will view and use the information.

DNA sequencing is clearly driving much of this revolution in personal genomics. In late May 2007, 454 Life Sciences in Branford, Connecticut, and the Human Genome Sequencing Center at the Baylor College of Medicine in Houston, Texas, made headlines around the world with the announcement that they had sequenced James Watson's entire genome using 454 Life Sciences' next-generation sequencing technology. And just four months later researchers at the J. Craig Venter Institute in Rockville, Maryland, along with collaborators at The Hospital for Sick Children in Toronto, Canada, and the University of California, San Diego, published the first full genome sequence of a single individual — Craig Venter1. This analysis, though, relied on the traditional approach of Sanger sequencing.

The Broad Institute's Chad Nusbaum uses several next-generation sequencing systems in his research. Credit: M. NEMCHUK, BROAD INST.

Now, some groups are looking to take DNA sequencing and personal genomics to even higher levels. “We want to look at 100,000 genomes and rather than just look at the genomics, in which you get an idea of variation like with the HapMap, we want to actually look at the trait connected with the variation and the environment,” says George Church of Harvard Medical School in Boston, Massachusetts, and founder of the Personal Genome Project (see 'Being well informed').

When first conceived in 2003, the Personal Genome Project faced numerous challenges, not least that the technology required to meet its goals was not even available. But technology is catching up with ambition, and advances in DNA sequencing are making it possible to decode individual genomes much faster, making endeavours such as the Personal Genome Project more feasible.

Sequencing's new wild west

A new generation of faster DNA-sequencing systems has exploded onto the genetic-analysis scene, with at least five companies offering or preparing to offer sequencers that boast amazing output. Several sequencers produce upwards of a billion bases of raw data per run — the equivalent of one-third of the human genome.

But Chad Nusbaum, co-director of the genome-biology programme at the Broad Institute in Cambridge, Massachusetts, is quick to point out that these new systems are in fact quite different from one another and at various stages of maturity. “The principle that we use when applying these new technologies is that there is a lot of expensive sequencing that we do with Applied Biosystem's 3730xl system and anything that we can move over to the new technologies, as long as it is effective, is bound to be cheaper.” The systems available now from Roche, Illumina and Applied Biosystems do seem to be effective, as the Broad Institute and other organizations are using them for various sequencing-based applications.

Assembling the future

By the end of last year, 454 Life Sciences, which was founded in 2000 and was recently acquired by Roche, had more than 60 of its sequencing systems placed around the world. “Our technology is in all major US genome centres and some of the international centres,” says Michael Egholm, vice-president of research and development at 454 Life Sciences.

The technology developed by 454 Life Sciences is based on two fundamental principles: emulsion PCR and pyrosequencing. Emulsion PCR side-steps the conventional process of bacterial cloning by attaching fragments of DNA 300 to 500 base pairs long to beads in vitro, then amplifying them with PCR to make millions of identical copies. Pyrosequencing allows for a massive parallel reaction format done in 1.6 million wells on a PicoTiterPlate. “Right now, day in and day out, we can perform 400,000 reads of 250 bases each with an accuracy of 99.5% or better,” says Egholm. Although the 454 Life Sciences system is not as accurate as conventional Sanger sequencing, Egholm notes that it is an order of magnitude more productive (see 'Truth and accuracy').

This upgraded Genome Sequencer FLX System allows more sequencing cycles and therefore longer reads than the previous Genome Sequencer 20 System. Longer reads help in whole-genome sequencing and assembly applications. “We believe that shortly there will be many more de novo assembled genomes due to our technology,” says Egholm. He notes that the genomes of several microorganisms have been assembled from scratch by use of 454 sequencing, and the technology has also been used to supplement Sanger sequencing on a few projects involving larger genomes.

The Genome Sequencer FLX, developed jointly by 454 Life Sciences and Roche Applied Sciences, is based on 454 sequencing technology. Credit: 454 LIFE SCIENCES

At the Broad Institute, where researchers use two FLX systems and one Genome Sequencer 20, Nusbaum appreciates the ease of the 454 sequencing process. “It is nicer than Sanger sequencing because it is a faster and simpler process.” He points out that at the Broad Institute, sequencing a bacterium can take a month with Sanger methodology, whereas with 454 technology it can be done in a week and without the high degree of clone tracking associated with Sanger sequencing.

Still, for de novo sequencing and to assemble larger genomes, such as those of mammals, longer paired reads — that is, two reads that are a known distance apart — will be necessary — an issue that Roche and 454 are trying to address. “Whether [454 sequencing] will work with a mammalian genome is a good question, and it is a little way off,” says Nusbaum. But he optimistically notes that 454 Life Sciences has exceeded his expectations in surmounting several other technical hurdles. Egholm, however, is much more direct in his vision for the future of 454 sequencing. “My goal is simple, I want to displace Sanger sequencing for de novo sequencing.”

Counting games

Like the Genome Sequencer FLX system, both Illumina and Applied Biosystems have used emulsion PCR as a starting point for their next-generation sequencing systems. But from there the methods of sequencing are quite different from each other.

Illumina now offers several SNP and copy-number-variation probe chips for genotyping applications. Credit: ILLUMINA

In January this year, Illumina, located in San Diego, California, acquired the Hayward-based firm Solexa. Solexa's key technology, previously called the Solexa 1G and now named the Genome Analyzer, is a next-generation sequencing system that can sequence the equivalent of a third of the entire human genome in a single run. The Broad Institute now uses 16 Genome Analyzers for various projects. “Any application that is counting-related is a very good one to perform using Illumina's system,” says Nusbaum.

Nusbaum and his colleagues, along with other groups, have already demonstrated the usefulness of the Genome Analyzer in looking at patterns of chromatin structure by using chromatin immunoprecipitation2,3,4,5. “It is an incredibly powerful application of the technology,” says Nusbaum. By pulling down DNA bound to histones carrying specific modifications, sequencing it and mapping it back to the genome, they could map the status of chromatin across the genome and throughout development. Nusbaum adds that it is also an incredibly easy application of the technology and anticipates that Illumina instruments will be used for other applications such as transcriptional profiling or microRNA and small RNA discovery. “It is also a great way to identify polymorphisms in genomes that are not extremely different.”

Applied Biosystems in Foster City, California, is rolling out its new sequencing by oligonucleotide ligation and detection, or SOLiD, system in October 2007. The target is to cover a whole human genome in one run, says Kevin McKernan, senior director of scientific operations at Applied Biosystems. McKernan says that in-house, the SOLiD system has been obtaining around a gigabase more data than their target, achieving 4 gigabases of sequence per run that aligns to the target genome, and 8 gigabases overall. McKernan thinks that with advances in the PCR process over time, this will turn into 8 gigabases of sequence that aligns to the target genome. The SOLiD system differs from other next-generation sequencing systems by performing sequencing by ligation rather than by synthesis, as conventional systems do. “Traditionally, people probably don't think of ligases as being more accurate than polymerases,” says McKernan. But he claims that the SOLiD system achieves its accuracy by the new way in which the probes are encoded.

The next-generation SOLiD system from Applied Biosystems uses ligation-based sequencing methods. Credit: APPLIED BIOSYSTEMS

By doing successive rounds of ligation and looking at particular probe colour, the SOLiD sequencer obtains information not from just one base, but also from adjacent bases. So after ligation every colour has more than one base of information, which permits multiple colour calls for each base location. By using this redundant information, McKernan says a tremendous amount of error correction can be performed when making base-call decisions. “What we have been seeing is that this is giving us a ten to twenty times improvement over polymerase-based systems in terms of raw accuracy,” says McKernan.

Applied Biosystems views the SOLiD system as particularly well suited for the cancer-research community and for applications in personal genomics. “Folks in the cancer community are going to gravitate towards it because they are looking for low-frequency mutations and the higher accuracy the system delivers through our error-correction scheme will be beneficial,” says McKernan. He notes that cancer genomes tend to be very complex, with copy-number changes or copy-number neutral changes, such as translocations. Using sequencing systems such as SOLiD, these changes can be visualized by watching when pair reads, or 'mate pairs', break, quickly providing information about translocations. These data are currently analysed with copy-number chip assays (See 'Chipping out our differences').

Both the Genome Analyzer and the SOLiD system have some limitations because of the short read lengths of around 35 base pairs per run. “For sequencing larger genomes, 35 bases or fewer per run just won't cut it. We tried with 100 bases for a long time and had problems,” says Egholm. McKernan says that the way Applied Biosystems is attempting to resolve this issue is by using mate pairs. “Whenever there is a single read in a repetitive region you do not know where to place it,” says McKernan. But he notes that with a mate pair you know that that read is linked to something 3–4 kilobases away, so those reads can be placed accurately. And this placement accuracy can be very important, as the repeat content of the human genome is high, estimated to be upwards of 50%.

Pulling in the 'exome'

Illumina now offers several SNP and copy-number-variation probe chips for genotyping applications. Credit: ILLUMINA

Church is excited by the fact that the advanced sequencing technology now in place has drastically lowered the cost of sequencing. “What has happened in the past year is that the price has plummeted by a factor of 100,” he says, making a project of the scope of the Personal Genome Project realistic from a financial perspective. The tricky part now is getting the most useful information from the human genome in a similarly cost-effective manner.

For the Personal Genome Project and other groups, the challenge is to obtain just exons, or the 'exome', from the human genome. Technically, the simplest method, performing in excess of 200,000 individual PCR reactions, would also be the most labour intensive and costly. But groups of researchers and companies are now working on ways to selectively amplify or capture different parts of the genome. “Soon you will be able to cherry-pick all the way along and identify the parts of the genome that are most likely to yield the maximum information,” says Church. And these methods will be very welcome additions to the genomics world because, as Church notes, not all pieces of DNA are created equal. Although he points out that no DNA is 'junk', he contends that by examining only 1% of the genome you can get about 98% of the information about positions that cause changes in traits.

The technology necessary for a personal-genomics revolution is here on the scene. Most people say that the major concern for personal-genomics projects is how to deal with the data from participants. And even on that front there seems to be a lot of optimism within the genomics community. Nusbaum is encouraged that people such as Church have taken up such initiatives as the Personal Genome Project. “I am glad that someone like George is taking this on because he has charisma and clout, so that even if people don't want to hear what he is saying, they have to listen.”