A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing

Chen, Shi-Yi; Deng, Feilong; Jia, Xianbo; Li, Cao; Lai, Song-Jia

doi:10.1038/s41598-017-08138-z

Download PDF

Article
Open access
Published: 09 August 2017

A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing

Shi-Yi Chen¹^na1,
Feilong Deng¹^na1,
Xianbo Jia¹,
Cao Li¹ &
…
Song-Jia Lai¹

Scientific Reports volume 7, Article number: 7648 (2017) Cite this article

6519 Accesses
71 Citations
8 Altmetric
Metrics details

Subjects

Abstract

It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.

Comprehensive transcriptome characterization of Grus japonensis using PacBio SMRT and Illumina sequencing

Article Open access 14 December 2021

Wentao Ye, Wei Xu, … Hongyi Liu

Full-length transcriptome sequencing from multiple tissues of duck, Anas platyrhynchos

Article Open access 21 November 2019

ZhongTao Yin, Fan Zhang, … Zhuo-Cheng Hou

The Single-molecule long-read sequencing of Scylla paramamosain

Article Open access 27 August 2019

Haifu Wan, Xiwei Jia, … Yilei Wang

Introduction

Rabbit (Oryctolagus cuniculus) is an important domestic mammal and placed within the family Leporidae of the order Lagomorpha¹. Due to the closely phylogenetic relatedness to human being, short vital cycle, less aggressive behavior and other advantages, rabbit has been widely used as animal model in biomedical researches². In particular, rabbit resembles human closely with respect to lipoprotein metabolism and is therefore thought to be a preferred animal model for studying human hypercholesterolemia³. In addition to the mechanistic investigation of human diseases, rabbit is involved in the experimental studies of surgery⁴. As a monogastric herbivore, rabbit can efficiently utilize plant proteins that are indigestible to human and are economically raised for providing meat, fur and wool products in many parts of the world, especially in China.

Rabbit genome had been sequenced in ~7X depth with the current assembly of 2.66 Gb in size (OryCun2.0)¹, for which Ensembl genebuild pipeline totally predicted 22,668 genic loci consisting of 24,964 transcripts in Release 85. However, most of the existing gene models are just derived from in silico prediction with lack of the reliable annotation on alternative isoforms and untranslated regions, which would prevent accurate evaluation of transcriptome complexity. The improved annotation of genome, therefore, is necessary to facilitate biological researches in rabbit. High-throughput RNA sequencing approach considerably alleviates the genome-wide investigation of gene transcription, alternative isoforms and expression level⁵. However, it has still remained challenging to reliably assemble transcripts in full-length from the short reads, which are essential to characterize the post-transcriptional processes, such as alternative splicing (AS) and alternative polyadenylation (APA) events.

Recently, the single-molecule long-read sequencing technology from Pacific Biosciences (PacBio) provides a better alternative to sequencing full-length cDNA molecules, which has been successively used for whole-transcriptome profiling in human and other species^6,7,8,9. The PacBio long-read sequencing was also specially used for predicting and validating gene models in eukaryotes¹⁰. In comparison with short read sequencing, the methodological advantages of PacBio RNA sequencing mainly include better completeness to sequence both 5′ and 3′ ends of cDNA molecules, higher accuracy to identify alternative isoforms, and also the increased power to distinguish RNA haplotypes^{6, 11}. However, the method of PacBio long-read sequencing could not be directly used to quantify gene expression for the moment. Here, we employ this technology to sequence polyadenylated RNAs of rabbit and provide a transcriptome-wide landscape in relation to gene models and alternative isoforms.

Results

PacBio sequencing and error correction of long reads

From three New Zealand white rabbits at different stages of age, we collected RNA samples of seven organs or tissues and then equally pool them together for library preparation and sequencing. About 61% of zero-mode waveguides, which are referred to as individual sequencing units, were productive among all 13 SMRT cells (Supplementary Table S1). We totally obtained 1,194,102 polymerase reads with full passes > = 0 and the predicted consensus accuracy >0.75, among which 802,358 ROIs (Reads Of Insert) were successfully extracted with mean length of 2,770 bp, quality of 0.92, and 12 passes (Table 1). Length distributions of raw ROIs were consistent with anticipation among the five constructed libraries (Supplementary Figure S1). All ROIs were further classified into 466,034 full-length (FL) non-chimeric and 316,000 non-full-length (nFL) sequences with differential patterns of length distribution (Fig. 1). Based on clustering algorithm of IEC (Iterative Clustering for Error Correction)¹², we finally got 156,838 consensus isoform sequences.

Table 1 PacBio libraries, SMRT cells and sequencing results.

Full size table

In parallel, we sequenced ~137 million paired-end reads on Illumina platform, from which ~120 million clean reads were generated after quality filtering. These short reads were subsequently used for correcting the consensus isoform sequences of PacBio (Fig. 1), which revealed that a total of 135,178 sequences (86.2%) were deduced to contain erroneous fragment(s) with terminal and/or inner distributions and therefore corrected. However, the length proportions of erroneous fragment were relatively low with the median of 8%. Also, there were 21,423 error-free and 237 completely erroneous isoform sequences, respectively.

Construction of transcriptomes

The corrected isoform sequences were mapped against reference genome and totally generated 127,037 raw unique alignments. Among them we first filtered out 30,252 potential false alignments using SpliceGrapher¹³, and follow which the sequences with aligned coverage lower than 0.95 were also discarded. A total of 59,051 clean unique alignments were finally retained and clustered into 14,474 independent genic loci. We further removed 22,826 redundant short alignments, which have differences of both <3 bp on intronic coordinates and <15 bp on downstream coordinate of the most 3′ exon. After removal of 39 singleton transcripts, we finally obtained the de novo constructed transcriptome that consists of 36,186 transcripts from 14,474 genic loci. Among them 6,418 genic loci (44.3%) had two or more alternative isoforms and 4,164 transcripts (11.5%) were transcripted from single exon (Supplementary Figure S2). For transcripts with multiple exons, median and mean of the most 5′ exons were 140 bp and 201 bp in length, respectively.

Based on Illumina short reads, a total of 46,871 transcripts from 37,850 genes were assembled by the genome-guided method of Cufflinks¹⁴. Meanwhile, Trinity¹⁵ de novo assembled 571,891 transcripts, 73.8% of which (422,292) were successfully mapped to reference genome.

Alternative splicing and polyadenylation

Within the de novo constructed transcriptome by PacBio reads, we totally detected 24,797 AS events (Table 2), including 3,479 intron retention (IR), 7,096 exon skipping (ES), 6,906 alternative 5′ sites (Alt. 5′) and 7,316 alternative 3′ sites (Alt. 3′). By contrast, only 2,398 AS events were observed in total within the reference gene models of rabbit in Ensembl, which was in order of magnitude fewer than PacBio transcriptome. After merging PacBio and Ensembl transcripts together, a total of 34,173 AS events were found with obvious increases for each of all four types. Within PacBio transcriptome we further determined 11,184 APA events at 3,492 genic loci. In Fig. 2, we arbitrarily selected five genes and schematically illustrated various alternative isoforms in comparison with reference gene models in Ensembl. The APA events were also specially exemplified in Supplementary Figure S3.

Table 2 Analyses of alternative splicing.

Full size table

Among the five examples in Fig. 2, we selected two known (A and C) and one novel (E) genes, all of them have complex AS events as being revealed by PacBio transcripts, and further verified the reliability of alternative isoforms according to mapping evidences of Illumina short reads (Supplementary Figure S4). It was clearly revealed that alternative isoforms of these three genes were reliably supported by the relatively varied coverages of short reads.

Comparison with Ensembl annotation

After being compared with reference annotation of rabbit genome in Ensembl, a total of 3,334 genic loci consisting of 3,637 transcripts within PacBio transcriptome had not been annotated yet and were believed to be novel. Furthermore, there were 12,112 transcripts having identical intronic coordinates with reference annotations. On the whole, these novel transcripts showed shorter length with the predominant distribution between 1,000 and 2,000 bp (Supplementary Figure S5).

Classification of non-coding RNAs

According to homologous searches against reference protein database, 30,183 and 6,003 transcripts were believed to be the protein-coding and non-coding RNAs, respectively. We additionally observed that non-coding transcripts have less exons, lower expression levels, and slightly higher ratio of exon to intron in length than that of protein-coding transcripts (Fig. 3). Genomic distributions of non-coding transcripts were also classified into 1,794 intergenic and 3,558 genic locations with respective to the protein-coding transcripts that were newly detected in the present study and already annotated within reference genome in Ensembl (Table 3); the two classes could be further divided into different sub-classes according to the recently proposed definitions by Wucher and colleagues¹⁶.

Table 3 Classification of non-coding transcripts.

Full size table

Recovery of paralogous genes

We selected ten Major Histocompatibility Complex (MHC) paralogous genes, all of which are adjacently distributed within a 1.2-Mbp region on chromosome 12 (Fig. 4), to exemplify the power of PacBio long-read sequencing to identify the paralogues. With an exception of HLA-A, all gene structures were well recovered by PacBio transcripts in comparison with reference annotation. Moreover, PacBio transcripts also supported many novel isoforms that have not been annotated yet. By contrast, performances of the assembled transcripts from short reads by genome-guided method were obviously lower than PacBio transcripts in terms of both the recovery of gene structure and number of isoforms. All of these paralogous genes were poorly assembled using the de novo approach, which was very apt to produce the fragmented and confusing transcripts.

Discussion

After the initial domestication occurred in southern France ~1,400 years ago¹, modern rabbits are now raised for laboratory animal, meat, fur and wool throughout the world. In comparison with other domestic animals such as pig¹⁷ and cattle¹⁸, little attention has been paid to genomic biology of rabbit despite the fact that the draft genome was early sequenced and publicly available in 2009. Recently, an international project titled “LaGomiCs” has been proposed for sequencing genomes of all species within the order Lagomorpha, which is expected to considerably increase our understanding of important biological problems in this and related taxonomic clades¹⁹. In the present study, we employed PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit and generated a large number of gene models and alternative isoforms that have not been annotated yet. The results therefore expand the reference set of gene transcriptions in rabbit. Of course, this profiling of rabbit transcriptome would not be exhaustive mainly because the limited biological samples were sequenced. Because our aim is to provide a general encyclopedia of gene transcriptions, we therefore sequenced the pooled RNA samples from different organs or tissues together with the reduced sequencing cost.

The methodological strengths of PacBio RNA sequencing were comprehensively investigated in human⁶, which had been proposed to be superior to methods of short read sequencing mainly due to the advantage for obtaining full-length transcripts. Furthermore, the accuracy of PacBio transcripts for identifying AS and APA events had been experimentally verified^{8, 9}. In the present study, we also showed the obviously improved power of PacBio transcripts for recovering the highly homologous sequences among ten MHC genes than the assembled transcripts from short reads. The length distribution of the most 5′ exons of our PacBio transcripts is consistent with former report in human⁶, which would indicate the comparable sequencing completeness in rabbit.

Human transcriptome was revealed to be much more complex than previously believed due to the application of PacBio long-read sequencing technology, which produced a bulk of novel isoforms that had not yet been annotated⁶. In a similar study for sequencing chicken transcriptome, ~10,000 novel isoforms and genic loci were also found to be absent from reference annotation⁷. For our newly sequenced transcriptome in rabbit, ~33,000 isoforms were transcripted from known genic loci and most of them have not been included within the reference annotation in Ensembl. Additionally, more than 3,000 genic loci were found to be novel. Because we designed strict criteria (see Materials and Methods) for filtering the genomic alignments of PacBio reads to reduce false positives, the actual complexity of rabbit transcriptome would be higher than the current observation.

About 17% of all newly generated transcripts in the present study were deduced to be non-coding RNAs, which is similar to the former study⁶. However, the occurrences of false positives of non-coding transcripts could not be absolutely excluded because this conclusion just depends on the computational approach of homologous search against reference protein database. Derrien and colleagues²⁰ reported that the long non-coding RNAs in human are characterized by less exons and lower expression than protein-coding RNAs, both of which were also detected in the present study. However, we observed a relatively higher proportion of single-exon transcripts among non-coding RNAs than that in human²⁰, for which one possible explanation is that many low-expressed multiple-exon RNAs could not be detected because of the limited sequencing depth of PacBio technology. Furthermore, it would also result into a slight difference of exon/intron length between protein-coding and non-coding RNAs observed in the present study.

In contrast to the merit for producing longer reads, PacBio long-read sequencing technology is also characterized by the relatively high error rate²¹, for which, therefore, many computational approaches have been proposed for error correction in aid of the short reads^{22, 23}. Accordingly, we sequenced the short reads in parallel to correct PacBio reads; and analysis results also revealed that more than 80% of PacBio reads contain erroneous fragments more or less. Therefore, it is suggested that the prior error correction of PacBio reads is necessary.

Materials and Methods

Ethics statement

The study design was approved and all methods were performed in accordance with guidelines of Institutional Animal Care and Use Committee in College of Animal Science and Technology, Sichuan Agricultural University.

Collection of samples and RNA preparation

We collected three healthy female New Zealand white rabbits at 21, 49 and 84 days of age, respectively. Seven organs or tissues consisting of brain, heart, lung, liver, spleen, skeletal muscle from hind leg, and intestinal sacculus rotundus were sampled for each rabbit. Subsequently, all 21 samples were subjected to RNA extraction using RNAiso Pure RNA Isolation Kit (TaKaRa, Japan), which was followed by treatment of DNaseI. NanoVue Plus was used to assess concentration and quality of the extracted RNAs. Finally, one microgram for each RNA sample was equally pooled together and subjected to PacBio single-molecule long-read sequencing (Pacific Bioscience, Menlo Park, USA) and Illumina PE150 sequencing (Illumina, San Diego, USA) in parallel.

Library construction and PacBio sequencing

According to the official protocol, one microgram of RNA was reversely transcribed using Clontech SMARTer cDNA synthesis kit. After PCR amplification, quality control and purification, we performed size selection using BluePippin Size Selection System protocol and herein produced five libraries corresponding to fragments of 0–1, 1–2, 2–3, 3–6 and 5–10 kb in length, respectively. The cDNA products were then subjected to construction of SMRTbell Template libraries using SMRTBell Template Prep Kit. Finally, a total of 13 SMRT cells were sequenced on PacBio RS II platform using P6-C4 chemistry with 4–6 h movies.

Illumina short-read sequencing and transcriptome assembly

The Illumina library was prepared using NEBNext UltraTM RNA Library Prep Kit (E7530L) for Illumina (NEB, USA). Briefly, polyadenylated RNA was isolated and fragmented into ~200 bp fragments. The first-strand cDNA was synthesized using random hexamer-primers, which was followed by synthesis of the second strand. The purified and repaired double-stranded cDNA fragments were selected by size. The amplified mRNA libraries were finally sequenced on Illumina HiSeq X Ten platform for generating 150 bp paired-end reads. Raw short reads were subjected to quality filtering using NGS QC Toolkit v2.3.3²⁴, for which we trimmed the first five bases from the 5′ end of read and removed reads consisting of the low quality bases (QA ≤ 30) >20% or ambiguous bases >1%.

The whole pipeline of bioinformatic analyses involved in the present study is outlined in Fig. 5. In addition to correction of PacBio reads, Illumina short reads are used for independently assembling transcripts by tools of Cufflinks v2.2.1¹⁴ and Trinity v2.3.0¹⁵, respectively. Briefly, all clean short reads were first mapped to reference genome of rabbit using Tophat2 v2.1.1²⁵ with the following parameters: alignment with no more than 3 mismatches (−N 3), no more than 2 gaps (–read-gap-length 2) and no more than 4 edit distances (–read-edit-dist 4). Subsequently, we assembled transcripts using Cufflinks with the default parameters¹⁴. With the default parameters, transcripts were de novo assembled using Trinity¹⁵ and further mapped against reference genome using Blat v35²⁶.

Error correction of PacBio reads

According to the official protocol, raw Polymerase reads that have full passes > = 0 and the predicted consensus accuracy >0.75 were selected for producing ROIs. After the ROIs shorter than 50 bp in length were discarded, they were classified into FL and nFL transcript sequences according to whether 5′/3′ cDNA primers and poly(A) tail were simultaneously observed or not. We subsequently employed three strategies for improving accuracy of PacBio reads. First, only FL ROIs were selected due to the relatively high reliability. Second, all sequences were subjected to isoform-level clustering by IEC algorithm and herein produced consensus sequences of isoform¹². Finally, the isoform sequences were corrected in aid of Illumina short reads using LoRDEC tool v0.6 with -k 21, -s 3, and default setting for other parameters²³.

Mapping and transcriptome construction by PacBio reads

The corrected isoform sequences were aligned against reference genome using GMAP aligner v2016-08-24²⁷ with the parameters: –min-identity 0.95 and –allow-close-indels 2. The sequences with multiple and chimeric alignments were excluded from the following analyses. Among raw unique alignments, we first filtered out false splice junctions using SpliceGrapher v0.2.4¹³, in which the models of splice sites were designed as donors of GT, GC or AT, and acceptor of AG or AC, respectively. The filtered alignments were clustered into independent genic loci such that every locus consists of the overlapped sequences with each other. Within each locus, alignments that have same coordinates on both intronic boundaries and 3′ end of the last exon were collapsed into unique sequence and the longest transcript was selected as representative of this isoform. For locus having multiple isoforms we also discarded the singleton transcript, which is defined that a transcript doesn’t have any splice junction being shared with others. When all sequences within a locus are singleton transcripts, only the longest transcript is retained. Here, we got the high-confidence alignments of isoform sequences and by which the full transcriptome was successfully de novo constructed.

Transcriptome analyses

For the de novo constructed transcriptome by PacBio reads, transcriptome-wide AS events were analyzed using SpliceGrapher¹³, which predicts four AS types consisting of IR, ES, Alt. 5′ and Alt. 3′. Among them, IR is inferred that one intron is retained within a longer exon and simultaneously flanked by two shorter exons. ES is defined when an exon is absent in some transcripts but present in others. When an intron is excised at more than one sites and linked to its 5′ or 3′ exons with different boundaries, they are considered as the Alt. 5′ and Alt. 3′, respectively. We further detected occurrence of APA when the aligned transcripts have same intronic coordinates but differ in downstream coordinate of the last exon with >15 bp in length.

The newly constructed transcriptome was first compared with reference annotation of rabbit genome in Ensembl (Release 85). The PacBio transcripts are classified as derivation from the known genes if the aligned strand and chromosomal locations overlap with any already annotated gene model; otherwise the sequences are transcribed from novel genic loci. For sequences being transcripted from known gene, a transcript was further defined as novel isoform if its intronic coordinates could not overlap any of the existing reference transcripts.

We further distinguished all PacBio transcripts between protein-coding and non-coding RNAs using dbHT-Trans tool v1.0²⁸, which deduces all potential open reading frames of sequence and subsequently translate into amino acids for homologous search against reference database. If a transcript could computationally encode protein that has one or more known homologs, it is regarded as protein-coding RNA and vice-versa. In the present study, we searched against the reference proteomes of rabbit, mouse, rat and human deposited in Uniprot database (release 2016_09), in which a positive hit was defined that at least 50% fragment of query sequence has more than 70% identity with target sequence. Subsequently, FEELnc tool v12/07/2016 was employed for classifying non-coding RNAs with the default parameters, which compares the genomic location and transcription orientation with respect to the nearest protein-coding RNA¹⁶. Two intrinsic features of both the number of exons per transcript and length ratio of exon to intron were compared between protein-coding and non-coding RNAs using in-house scripts. Furthermore, we employed RSEM tool v1.1.11 for quantifying transcript abundances by Illumina short reads with the default parameters²⁹, which outputs the measure of FPKM (fragments per transcript kilobase per million fragments mapped).

Data availability

The newly constructed rabbit transcriptome (in GTF file) by PacBio reads in the present study is available upon request.

References

Carneiro, M. et al. Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication. Science 345, 1074–1079 (2014).
Article CAS PubMed PubMed Central ADS Google Scholar
Mapara, M., Thomas, B. S. & Bhat, K. M. Rabbit as an animal model for experimental research. Dent. Res. J. 9, 111–118 (2012).
Shiomi, M. Rabbit Biotechnology (eds Houdebine, L.-M. & Fan, J.) 49–63 (Springer Netherlands, 2009).
Calasans-Maia, M. D., Monteiro, M. L., Áscoli, F. O. & Granjeiro, J. M. The rabbit as an animal model for experimental surgery. Acta Cir. Bras. 24, 325–328 (2009).
Article PubMed Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Article CAS PubMed PubMed Central Google Scholar
Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).
Article CAS PubMed PubMed Central Google Scholar
Thomas, S., Underwood, J. G., Tseng, E. & Holloway, A. K. Long-read sequencing of chicken transcripts and identification of new transcript isoforms. PLoS One 9, e94650 (2014).
Article PubMed PubMed Central ADS Google Scholar
Abdel-Ghany, S. E. et al. A survey of the sorghum transcriptome using single-molecule long reads. Nat. Commun. 24, 11706 (2016).
Article ADS Google Scholar
Wang, B. et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat. Commun. 7, 11708 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar
Minoche, A. E. et al. Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biol. 16, 184 (2015).
Article PubMed PubMed Central Google Scholar
Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl. Acad. Sci. USA 111, 9869–9874 (2014).
Article CAS PubMed PubMed Central ADS Google Scholar
Gordon, S. P. et al. Widespread polycistronic transcripts in fungi revealed by single-molecule mRNA sequencing. PLoS One 10, e0132628 (2015).
Article PubMed PubMed Central Google Scholar
Rogers, M. F., Thomas, J., Reddy, A. S. & Ben-Hur, A. SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data. Genome Biol. 13, R4 (2012).
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wucher, V. et al. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 45, e57 (2017).
PubMed PubMed Central ADS Google Scholar
Groenen, M. A. M. et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491, 393–398 (2012).
Article CAS PubMed PubMed Central ADS Google Scholar
Daetwyler, H. D. et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat. Genet. 46, 858–865 (2014).
Article CAS PubMed Google Scholar
Fontanesi, L. et al. LaGomiCs-Lagomorph Genomics Consortium: an international collaborative effort for sequencing the genomes of an entire mammalian order. J. Hered. 107, 295–308 (2016).
Article PubMed PubMed Central Google Scholar
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Article CAS PubMed PubMed Central Google Scholar
Thompson, J. F. & Milos, P. M. The properties and applications of single-molecule DNA sequencing. Genome Biol. 12, 217 (2011).
CAS PubMed PubMed Central Google Scholar
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
Article CAS PubMed PubMed Central Google Scholar
Salmela, L. & Rivals, E. LoRDEC: accurate and efficient long read error correction. Bioinformatics 30, 3506–3514 (2014).
Article CAS PubMed PubMed Central Google Scholar
Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).
Article CAS PubMed PubMed Central ADS Google Scholar
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Article PubMed PubMed Central Google Scholar
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS PubMed Google Scholar
Deng, F. & Chen, S.-Y. dbHT-Trans: an efficient tool for filtering the protein-encoding transcripts assembled by RNA-Seq according to search for homologous proteins. J. Comput. Biol. 23, 1–9 (2016).
Article PubMed Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was financially supported by Science & Technology Department of Sichuan Province (2016NYZ0046) and Earmarked Fund for China Agriculture Research System (CARS-44-A-2).

Author information

Shi-Yi Chen and Feilong Deng contributed equally to this work.

Authors and Affiliations

Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, 611130, China
Shi-Yi Chen, Feilong Deng, Xianbo Jia, Cao Li & Song-Jia Lai

Authors

Shi-Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Deng
View author publications
You can also search for this author in PubMed Google Scholar
Xianbo Jia
View author publications
You can also search for this author in PubMed Google Scholar
Cao Li
View author publications
You can also search for this author in PubMed Google Scholar
Song-Jia Lai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.Y.C. and S.J.L. designed experiments. S.Y.C., F.D. and C.L. performed bioinformatic analyses. X.J. and S.J.L. prepared samples. S.Y.C. wrote manuscript with help of all co-authors.

Corresponding authors

Correspondence to Shi-Yi Chen or Song-Jia Lai.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, SY., Deng, F., Jia, X. et al. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing. Sci Rep 7, 7648 (2017). https://doi.org/10.1038/s41598-017-08138-z

Download citation

Received: 07 December 2016
Accepted: 07 July 2017
Published: 09 August 2017
DOI: https://doi.org/10.1038/s41598-017-08138-z

This article is cited by

PacBio single-molecule long-read sequencing provides new insights into the complexity of full-length transcripts in oriental river prawn, macrobrachium nipponense
- Cheng-Yan Mou
- Qiang Li
- Lu Zhang
BMC Genomics (2023)
Isoform-resolved transcriptome of the human preimplantation embryo
- Denis Torre
- Nancy J. Francoeur
- Robert Sebra
Nature Communications (2023)
Dysregulation and therapeutic targeting of RNA splicing in cancer
- Robert F. Stanley
- Omar Abdel-Wahab
Nature Cancer (2022)
Rabbit microbiota across the whole body revealed by 16S rRNA gene amplicon sequencing
- Xiaofen Hu
- Fei Wang
- Yong Li
BMC Microbiology (2021)
Time course profiling of host cell response to herpesvirus infection using nanopore and synthetic long-read transcriptome sequencing
- Zoltán Maróti
- Dóra Tombácz
- Zsolt Boldogkői
Scientific Reports (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

PacBio sequencing and error correction of long reads

Construction of transcriptomes

Alternative splicing and polyadenylation

Comparison with Ensembl annotation

Classification of non-coding RNAs

Recovery of paralogous genes

Discussion

Materials and Methods

Ethics statement

Collection of samples and RNA preparation

Library construction and PacBio sequencing

Illumina short-read sequencing and transcriptome assembly

Error correction of PacBio reads

Mapping and transcriptome construction by PacBio reads

Transcriptome analyses

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links