A trio of researchers has encoded a draft of a book into DNA. The 5.27-megabit tome contains 53,246 words, 11 JPG image files and a JavaScript program, making it the largest piece of non-biological data ever stored in this way.

DNA has the potential to store huge amounts of information. In theory, two bits of data can be incorporated per nucleotide — the single base unit of a DNA string — so each gram of the double-stranded molecule could store 455 exabytes of data (1 exabyte is 1018 bytes). Such dense packing outstrips inorganic data-storage devices such as flash memory, hard disks or even storage based on quantum-computing methods.

The DNA book — fittingly, a treatise on synthetic biology — was encoded by geneticists George Church and Sriram Kosuri at the Wyss Institute for Biologically Inspired Engineering in Boston, Massachusetts, and Yuan Gao, a biomedical engineer at Johns Hopkins University in Baltimore, Maryland. They report their work in Science1 this week.

The book marks a significant gain on previous projects — the largest of which encoded less than one-six-hundredth of the data — but organic flash drives are still many years away. There are a number of reasons why the method is not practical for everyday use: for one, both storing and retrieving information currently require several days of lab work, spent either synthesizing DNA from scratch or sequencing it to read the data.

Such limitations will ease over time, predicts Stuart Parkin, who is developing dense forms of inorganic storage media at the IBM-Stanford Spintronic Science and Applications Center in San Jose, California. “This coupling of the biological world to the physical world will lead to some very interesting storage devices in the next decade,” he says.

Short and sweet

Encoding the DNA book didn't require fundamentally new technology so much as the creative application of existing techniques, explains Anne Condon, a computer scientist at the University of British Columbia in Vancouver, who studies how DNA molecules can be used in computing.

Previous attempts to store information in DNA have been held up by difficulties in making perfect long strands. Shorter molecules present less of a challenge, so Church and his colleagues kept their storage strands just 159 nucleotides long, and generated multiple copies of each to make catching and correcting mutations easier.

In each single strand, 96 nucleotides represented the encoded data as digital ones and zeroes; 19 nucleotides showed how these data blocks should be ordered; and 44 nucleotides enabled easier sequencing. The researchers' binary code assigned 'zero' to two types of nucleotide (As and Cs) and 'one' to the other two types (Gs and Ts).

“It's using some simple ideas in very elegant ways to improve the density of information that one can store,” says Condon. But she says that the technology will work best for specialized applications in which data need to be stored for a long time without being read.

The ideal storage period might be as long as centuries. Even as other current storage technologies become as obsolete as magnetic tape and floppy disks are now, says Kosuri, researchers will always be trying to improve the technology for reading and writing DNA, because the molecule is so central to biology.

That will bring costs down over time, enabling DNA data storage to move beyond the realm of demonstration projects. In the past five years, sequencing technologies have become at least eight times cheaper, says Kosuri, and DNA synthesis has achieved the same gains over the past eight years. "The DNA chip we used for this paper held 55,000 oligonucleotides," he adds. "The newest ones hold a million."