High Throughput Sequencing and Cost Trends

Last week I talked about Frederick Sanger's method of sequencing DNA, a methodical approach to what was one of the most pressing problems in molecular biology. To sequence a large piece of DNA with this method you start at one end, read as far as the reaction will go (it usually only reads around 800-1000 DNA bases, a fraction of the human genome's astonishing 3 billion), and then run a second reaction starting a bit further down the strand. His method requires that you march down the DNA strand, reaction by reaction.

This method is pretty foolproof, but it takes a very long time. The publicly funded human genome project took over 13 years to decode the genome using this method. And at $3 billion it certainly wasn't cheap. (From what I understand, $3 billion is the total cost of R&D to make the project possible. The actual cost of sequencing was more like $500 million). The private company Celera Genomics and Craig Venter famously (or infamously) challenged the public human genome project with their own, private initiative. Instead of marching down the genome they splintered many copies of it into tiny fragments, sequenced the individual pieces, and used supercomputers to stitch them together into the whole genome. This approach cost only $300 million, though Venter's team did have access to the public genome project data.

Today, either a $3 billion or $300 million genome would be ludicrous. Saying that the cost of sequencing DNA has plummeted would be a gross understatement. Allow me to divert your attention to an analogy with computers for a minute. Everyone has heard of Moore's law, the idea that the number of transistors that can be fitted onto a computer chip doubles about every 2 years. Moore's law holds that computer chip technology improves at an exponential pace. As a budding biologist I'm proud to say that the improvement of DNA sequencing technology puts Moore's law to shame. Rob Carlson, author of Biology is Technology (on my reading list!), generated some very nice graphs on his blog, one of which I've reproduced above. The blue line is the per-base cost of DNA sequencing (I'll talk about what the other curves represent next week). The first data point appears to correspond to the year 1990, and the price to around $20 per base! Note the log scale, where each tick on the y-axis is a 10-fold price difference-in US dollars-from its adjacent ticks. The latest numbers are closer to about $3 × 10^-7 per base. This is a mere 0.00003 cents. What a remarkable change in just over 20 years. If we used a variation of Moore's law, where the cost per base halves every 2 years, starting with the $20 of 1990, today's cost would be about a penny a base, a price over 5 years out of date and 5 orders of magnitude too high!

The dramatic reduction of DNA sequencing cost has been made possible by high throughput sequencing technologies that move beyond traditional Sanger sequencing. I'll spare you the details, but know that the DNA sequencing machines of today are generational shifts compared to Sanger sequencing. Instead of running individual sequencing reactions we can run many simultaneously. The image at right shows my very own 454 PicoTiterPlate (it was a gift!). 454 is one of several methods of sequencing DNA in a high throughput manner. From what I've been able to gather, my chip contains 1.6 million hexagonal wells that each hold 75 picoliters. That's 75 trillionths of a liter, which is orders of magnitude less than a drop. Obviously they are each much too small to see. If you want to learn more about 454 sequencing I've placed a link in the references below. The most important thing to remember is that (ideally) one run will give you 1.6 million sequences. This is an amazing tool for decoding genomes, whether it's our own genome or those of a whole population of microbes. And, by the base pair, it's dirt cheap. I fully expect to have my entire genome sequenced in my lifetime, and I'd bet money that yours will be too.