Main

The four nucleotides that make up the primary sequence of DNA are not the sole determinants of gene expression: there is a fifth player with a prominent role—methylcytosine, also referred to as the fifth base. Addition of a methyl group to cytosine is associated with stable and heritable repression of transcription.

Finding genes and genomic regions that are silenced during development or disease processes is of great interest to researchers; methylcytosines serve as markers, but precisely locating the distribution of these markers in the whole genome is a challenge.

It was such a fine-scale map of genome-wide methylation sites that Steve Jacobsen and computational biologist Matteo Pellegrini from the University of California at Los Angeles (UCLA) wanted to develop. Their team chose the relatively small genome of the plant Arabidopsis thaliana to develop and hone the necessary tools and techniques (Fig. 1).

Figure 1
figure 1

iStockphoto

Arabidopsis thaliana.

Over the years researchers have used DNA microarrays to locate the methylcytosines, but some of the shortcomings of microarrays left Jacobsen and Pellegrini looking for new techniques. Jacobsen explains: “One of the limitations of arrays is that they are not single-base-resolution,” and single-base resolution was precisely what the scientists needed to analyze the methylation status of every cytosine in the Arabidopsis genome; so they turned to high-throughput sequencing.

To prepare the plant DNA for sequencing, the scientists had two requirements: they needed to distinguish methylated cytosines from the non-methylated versions by their sequence and break the genome into short fragments flanked by the sequencing primers.

Jacobsen's team combined the construction of the short sequence DNA library with the well-established technique of bisulphite conversion, during which unmethylated cytosines are converted to uracil and ultimately thymidine, while the methylated cytosines remain unchanged. To ensure complete conversion of cytosines, the researchers ligated primers to their randomly sheared DNA before the bisulphite treatment, and then used a second set of primers—which only anneal to the first set if complete conversion has taken place—to amplify the library. This second set or primers contained a 5-nucleotide tag that would allow them to orient the read and determine whether it came from the sense or the antisense strand.

To efficiently deal with the enormous amount of data generated by high-throughput sequencing technology, Jacobsen and Pellegrini developed algorithms that improved the quality of the base called during the actual sequencing procedure, allowed better mapping of the reads to the genome and filtered out any reads that still contained unconverted cytosines.

They installed filters in their analysis program that eliminated all reads that did not uniquely map to the genome and ended up with a DNA methylation map that comprised 84% of the plant genome.

The results speak to the increased sensitivity of this bisulphite-sequencing method over microarray-based techniques. The team at UCLA was able to find new methylation sites in genes previously classified as unmethylated; they mapped methylation across highly repetitive ribosomal DNA loci and accurately detected methylated promoters.

Of course such high-resolution methylation mapping is of interest not only to the plant community. Jacobsen is certain that their approach is transferable to higher organisms such as mouse and human. He sees the main limitation at this point in the high cost of sequencing for large genomes but adds confidently: “sequencing technologies are improving their throughput at a fast pace, so this technique will be practical quickly.”

Detailed methylation patterns may soon be as self evident a resource as primary genomic sequences are at the moment.