Analysis of 5′ transcript heterogeneity by high-throughput sequencing of cDNA

Spanu, Pietro D; Doyle, Ken

doi:10.1038/nmeth.f.257

Download PDF

Advertising Feature: Application Note
Published: July 2009

Analysis of 5′ transcript heterogeneity by high-throughput sequencing of cDNA

Pietro D Spanu¹ &
Ken Doyle²

Nature Methods volume 6, pages i–ii (2009)Cite this article

770 Accesses
Metrics details

Abstract

We isolated mRNA from Blumeria graminis and prepared a cDNA library using a modification of Epicentre's ExactSTART^™ Full-Length cDNA Library Cloning kit. Analysis of the 5′ ends of the cDNA by 454 pyrosequencing yielded approximately 250,000 expressed sequence tags (ESTs) from one run of sequencing. The data also showed marked heterogeneity of the 5′ ends of the transcripts, including the addition of non–template-encoded bases.

Main

Epicentre's ExactSTART technology enables the user to selectively tag the exact 5′ end of any RNA species present in a total RNA population. The ExactSTART Full-Length cDNA Library Cloning kit was originally designed to create a directionally cloned, full-length cDNA library for the identification of transcription start sites and other analysis of the 5′ ends of transcripts. For example, the kit has been used to discover alternative transcription initiation sites in a cDNA library produced from Saccharomyces cerevisiae (Vaidyanathan, R. et al. ExactStart^™ Full-Length cDNA Library Cloning Kit: a rapid and efficient method to synthesize full-length cDNA for cloning and accurate mapping of transcription initiation and polyadenylation sites. EPICENTRE Forum 14.2, 4–5; 2007).

The powdery mildew fungus Blumeria graminis is one of the most important pathogens of cereal crops and can reduce crop yields by as much as 40%. One of the experimental challenges posed by B. graminis is that it can only be grown on its host; thus, the supply of biological material is very limited and may be contaminated by host tissues. One of the requirements of genome annotation is a collection of full-length cDNA sequences from as many diverse stages of the organism as possible. The advent of high-throughput DNA sequencing platforms has revolutionized the depth at which transcriptomes can be analyzed, and the development of robust and efficient protocols for generating cDNA that can be introduced directly in the sequencing pipeline is of huge importance. Here we describe minor modifications to adapt the ExactSTART technology to enable compatibility with 454 sequencing (Fig. 1).

Preparation of cDNA for sequencing

We dissected epiphytic mycelia from barley leaves infected with Blumeria graminis f. sp. hordei using procedures previously described¹ and extracted total RNA from the fungal structures obtained from approximately 200 infected primary leaves², yielding approximately 150 μg of total RNA. Of this sample, we processed 40 μg of total RNA using Epicentre's mRNA-ONLY^™ Eukaryotic mRNA Isolation kit to remove non-mRNA species.

Then we synthesized cDNA, following the ExactSTART Full-Length cDNA protocol (Fig. 1). The 5′ cap structure was removed from the mRNA using tobacco acid pyrophosphatase in a 10 μl reaction. In the following step, the RNA acceptor oligo was replaced with a custom-made oligoribonucleotide compatible with the 454 Adaptor A sequence (5′-GCCUCCCUCGCGUUAUCAGA-3′) and ligated to the decapped mRNA. Next, cDNA was synthesized using 20 μl of the ligated RNA sample directly in the first-strand synthesis reaction, using a custom-made primer containing the 454 Adaptor B sequence (SAD-R poly(T): 5′-GCCTTGCCAGCCCGCTCAG(T)₂₅-3′). Second-strand cDNA synthesis and PCR amplification were carried out using Phusion DNA polymerase (NEB) in a 100 μl reaction. The primers used in the amplification reaction were modified to render them compatible with the 454 pyrosequencing protocol (SAD-For: 5′-GCCTCCCTCGCGCCATCAGA-3′; SAD-Rev: 5′-GCCTTGCCAGCCCGCTCAGT-3′). The cDNA yield was approximately 7 μg, and gel electrophoresis showed a DNA smear with a modal distribution of around 900 bp (Fig. 2). We sent the DNA to Roche Diagnostics for 454 pyrosequencing (one run on GS-FLX). The sample yielded 247,306 reads, a total of 50.8 megabases, corresponding to an average read length of 205 bases. The data were assembled (using the MIRA assembler; http://www.chevreux.org/projects_mira.html), clustered and combined with the expressed sequence tags (ESTs) available in our own databases and in public repositories. This increased the number of unique B. graminis genes identified by cDNA sequencing from 4,584 to 7,727.

**Figure 2: Size range of cDNA produced.**

Analysis of 5′ heterogeneity

When we compared the cDNA sequences to genomic DNA sequences, it became evident that there was a marked heterogeneity in the populations of RNA. This was true for the actual length of the sequence (possibly reflecting different starts of transcription and/or processing) and of the actual sequence of bases. Figure 3 illustrates examples of the heterogeneity of one gene, which is representative of these findings. From this it is clear that some bases were added in the transcript at a considerable distance from the beginning of the mature transcript. The majority of these were adenosines, but thymine, cytosine and guanosine were also found. It should be noted that the raw data from the 454 pyrosequencing included four bases from the SAD-For primer, and these were accurately sequenced without exception. Therefore, the heterogeneity found at the 5′ end of the mRNA reflects in vivo reality, possibly because of inaccurate transcription by the RNA polymerase in the very first stages of the process. This phenomenon has also been noted by other studies in related fungi (for example, Magnaporthe grisea³) and other eukaryotic organisms⁴.

**Figure 3: The sequence heterogeneity found at the 5′ end of the sequenced transcripts for a randomly chosen group of sequences corresponding to the same genetic locus.**

Conclusions

As shown here, the ExactSTART procedure can be easily adapted to produce full-length cDNA for high-throughput sequencing analysis by modifying the tagging oligonucleotides and PCR amplification primers. The double-stranded, amplified cDNA produced can be directly used in 454 sequencing, providing valuable information about the 5′ heterogeneity associated with transcriptional start sites in many organisms.

References

Both, M. et al. Transcript profiles of Blumeria graminis development during infection reveal a cluster of genes that are potential virulence determinants. Mol. Plant-Microbe Interact. 18, 125–133 (2005).
Article CAS Google Scholar
Chomczynski, P. & Sacchi, N. Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem. 162, 156–159 (1987).
Article CAS Google Scholar
Gowda, M. et al. Robust analysis of 5′-transcript ends: a high-throughput protocol for characterization of sequence diversity of transcription start sites. Nat. Protoc. 2, 1622–1632 (2007).
Article CAS Google Scholar
Gowda, M. et al. Robust analysis of 5′-transcript ends (5′-RATE): a novel technique for transcriptome analysis and genome annotation. Nucleic Acid Res. 34, e126 (2006).
Article Google Scholar

Download references

Acknowledgements

The data presented in this application note were obtained in collaboration with T.A. Burgis and J.C. Abbott, Imperial College London.

Author information

Authors and Affiliations

Department of Life Sciences, Imperial College London, South Kensington Campus, London, UK
Pietro D Spanu
Epicentre Biotechnologies, Madison, Wisconsin, USA
Ken Doyle

Authors

Pietro D Spanu
View author publications
You can also search for this author in PubMed Google Scholar
Ken Doyle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ken Doyle.

Additional information

Disclaimer

This article was submitted to Nature Methods by a commercial organization and has not been peer reviewed. Nature Methods takes no responsibility for the accuracy or otherwise of the information provided.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spanu, P., Doyle, K. Analysis of 5′ transcript heterogeneity by high-throughput sequencing of cDNA. Nat Methods 6, i–ii (2009). https://doi.org/10.1038/nmeth.f.257

Download citation

Issue Date: July 2009
DOI: https://doi.org/10.1038/nmeth.f.257

Analysis of 5′ transcript heterogeneity by high-throughput sequencing of cDNA

Abstract

Main

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Search

Quick links

Abstract

Main

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links