Targeting portions of the transcriptome for deep sequencing reveals very rare transcripts.
Deep sequencing of any transcriptome generates many spurious reads. There is spirited debate over the extent to which these scattered staccato signals, which are easy to dismiss as noise, are a sign of pervasive genomic transcription. Motivated by this question, John Rinn, John Mattick and colleagues working at Harvard University and the University of Queensland have come up with RNA CaptureSeq, a strategy that ramps up sensitivity by targeting only a specific portion of the transcriptome. With Jeffrey Jeddeloh of Roche NimbleGen, they designed tiling arrays to select genomic regions for sequencing at fantastic depths, gaining enough coverage to confidently detect the rarest of transcripts.
Like targeted genomic capture, hybridization of complementary DNA to overlapping probes on the arrays traps just a portion of the transcriptome for sequencing, allowing deeper sampling of the original RNA with the same amount of sequencing—what Mattick likens to a “transcriptomic microscope.” To test their idea, the researchers designed an array for ∼50 human genes, a few long noncoding RNAs (lncRNAs) and nearly 1,000 intergenic regions bearing epigenetic marks of active transcription. Applying the method to human fibroblast RNA, they achieved over 4,600-fold coverage, or a ∼380-fold enrichment of reads that map to targeted regions.
What were the results of this closer scrutiny? “The transcriptional complexity of the genome is far greater than people thought,” says Mattick. “It operates at a much finer scale.” The researchers found over 200 new protein isoforms, including four from the heavily studied p53 tumor suppressor locus. They also detected 163 new lncRNAs near protein-coding regions and gathered robust evidence of lncRNA splicing.
Results from intergenic regions were no less dramatic. The researchers detected transcription from nearly every base, and almost half of the regions gave rise to polyadenylated messages with conventional structure including introns, exons and features typical of lncRNAs. Another key finding was that many of these transcripts were exceedingly rare: present at less than one copy per 1,000 cells. “Each cell is transcriptionally unique,” says Mattick, pointing to the scale of cellular heterogeneity that these results imply for development. “It's not 200 cell types; it's a hundred trillion cells doing precise things in different places.”
Beyond informing our general view of the transcriptome, the team sees great potential in the method. Tim Mercer, who led the work, notes that many applications can benefit from additional depth. The group is exploring the application to genome-wide association studies that link multiple complex diseases to genomic regions that lack annotated genes. RNA CaptureSeq offers a way to reanalyze these regions for functional changes in rare transcripts. Likewise, small RNAs can be comprehensively mapped using the technique, even if they are expressed from a minority of cells. The bandwidth savings also facilitates greater mutliplexing of sequencing experiments.
“It is almost impossible to provide both a comprehensive and deep view of the transcriptome at current sequencing capacity,” says Mercer. Targeted capture can help solve this by focusing on regions of interest with high sensitivity. Many aspects of the transcriptome certainly bear taking such a deep look, with the potential to radically change how we see fundamental concepts such as cell identity and genomic activity.
Mercer, T.R. et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat. Biotechnol. advance online publication (13 November 2011).