Background & Summary

Most bats have evolved echolocation to navigate, explore environment and hunt prey in the darkness1. All echolocating bats require ultrahigh frequency hearing for reception of ultrahigh frequency sounds, which is essential in the process of echolocation2. High frequency hearing is also important for non-echolocating mammals, including human. However, the molecular mechanisms underlying the origin of high frequency hearing is still unknown3. Echolocating bats with ultrahigh frequency hearing provide a unique model for studying the molecular basis of high frequency hearing in mammals.

Modulation of gene expression and alternative mRNA splicing are two major forms of transcriptional regulation, responsible for the origin of novel phenotype and phenotypic diversity4,5,6,7. Recently, high-throughput transcriptome sequencing (RNA-seq) of cochlear tissue has been used to uncover differentially expressed genes possibly associated with the origin of ultrahigh frequency hearing8, the divergence of different echolocating types9 and echolocation call frequency variation10. In these earlier studies, the reference used for quantification of gene expression was from a de novo assembly based on the short RNA-seq reads which may contain many artificial transcripts11. The PacBio single-molecule real-time isoform sequencing (Iso-seq) can generate full-length (FL) sequences of all transcripts without the need for assembly12, which has been integrated with RNA-seq for transcriptome quantification in multiple studies12,13. PacBio Iso-seq is also used to detect alternative splicing events without the help of a reference genome sequence14 and to identify previously unannotated transcripts15. So far, no PacBio Iso-seq study has been conducted on the cochlear tissue of echolocating bats.

In this study we generated FL transcriptome datasets from the cochlear tissue of two kinds of echolocating bats using PacBio Iso-seq. Echolocating bats with ultrahigh frequency hearing (laryngeal echolocation) include constant-frequency (CF) bats and frequency-modulated (FM) bats16. We collected cochlear tissues from both CF and FM bats in order to get a comprehensive list of transcripts involved in ultrahigh frequency hearing (Table 1). We chose Rhinolophus affinis (Rhinolophidae) and Myotis ricketti (Vespertilionidae) as the representatives for CF and FM bats, respectively. To investigate the genetic basis of intraspecific echolocation call frequency variation in future, we included two Rhinolophus affinis subspecies (R. a. hainanus and R. a. himalayanus) which show divergent echolocation call frequencies17,18. For clarity, the FL transcriptomes from the CF bats (R. a. hainanus and R. a. himalayanus) and FM bat (Myotis ricketti) were called FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo, respectively. After PacBio Iso-seq data processing, we obtained a total of 10103, 9676 and 10504 non-redundant FL transcripts for FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo respectively, ranging in size from 201 bp to 9740 bp (Table 2). The number of transcripts annotated in NCBI non-redundant protein sequences (Nr) and the UniprotKB database at least once is 9564, 9079 and 10090, respectively (Table 3). By combining the datasets from the three taxa we also generated a FL transcriptome of echolocating bats (FL-CF-FM) which contains 26342 FL transcripts with 24833 of them annotated in Nr or UniprotKB database (Tables 2 and 3).

Table 1 Detailed information about Iso-seq libraries.
Table 2 Statistics of the four FL transcriptomes generated in this study.
Table 3 Annotation statistics for each of the four FL transcriptomes.

One limitation of this study is that we did not include biological replicates when generating the Iso-seq dataset for each taxon due to limited tissues available and a large amount of RNA required in PacBio Iso-seq library construction. Currently, the high cost for PacBio sequencing is another constraint to be considered. If the main aim of the study is to identify transcripts expressed in one or multiple tissues, as in most of current studies using FL transcriptome sequencing, it is unnecessary to include additional biological replicates. However, we pooled RNA from three individuals during library constructions of each of three echolocating bats in this study. By this way, we tried to avoid missing any transcripts due to degradation of RNA a specific individual and thus obtained a comprehensive list of transcripts expressed in cochlea.

The current FL transcriptomes generated in this study are sufficient to be reused in the several aspects. They can be used as the reference to reanalyze the RNA-seq datasets of cochlea in previous comparative transcriptomic studies8,9,10. Quantification of transcript expression by mapping reads to the FL transcriptome will help to improve the accuracy of identifying differentially expressed transcripts12. Moreover, by comparing with transcripts expressed in non-echolocating mammals, the current FL transcriptomes from echolocating bats will help to test whether alternative splicing plays an important role in the origin of novel phenotype (ultrahigh frequency hearing). In addition, FL transcriptomes from FM bats and two CF subspecies could be used to test the roles of alternative splicing in the divergence of different echolocating types (CF and FM) and in intraspecific echolocation call frequency variation. Finally, these FL transcriptome datasets will be useful for identification of novel transcripts and for improvement of genome annotation of Rhinolophus affinis,Myotis ricketti, and other bat species19,20.

Methods

Sample collection and RNA preparation

We captured nine adult male bats from China including three Myotis ricketti from Jiangsu on April 19, 2018, three Rhinolophus affinis hainanus from Hainan on May 6, 2019, and three R. a. himalayanus from Anhui on January 4, 2019. Bats were rapidly euthanized by cervical dislocation, and cochleae were collected and transferred to RNase-free PCR tubes. Tissue samples were frozen immediately in liquid nitrogen and stored at −80 °C until RNA extraction. All sampling procedures were in accordance with the guidelines of Regulations for the Administration of Laboratory Animals approved by the Animal Ethics Committee of East China Normal University (ID no: bf20190301).

RNA from each tissue was extracted individually using Trizol reagent (Invitrogen, CA, USA) according to the manufacturer’s instructions. Poly-A mRNAs were harvested using oligo-dT attached magnetic beads. RNA concentration was assessed using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, USA), and RNA integrity number (RIN) values were assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA) (Fig. 1 and Table 1).

Fig. 1
figure 1

Overview of the sequencing data collection (a) and analysis pipeline (b).

Library construction and full-length sequencing

RNA from three individuals of each taxon (R. a. hainanus, R. a. himalayanus and Myotis ricketti) were pooled to obtain enough amount of RNA (800–1000 ng) for PacBio Iso-seq library construction. We built one independent SMRTbell library for each taxon (a total of three libraries) with the PacBio DNA Template Prep Kit 3.0 according to the manufacturer’s instructions. SMRT sequencing was performed with the PacBio Sequel platform.

Generation of the full-length transcriptomes

PacBio Iso-seq raw data (subreads) from each taxon were analyzed using the SMRTLink software (v6.0). First, the circular consensus sequences (CCSs) were generated from subreads. The FL sequences with intact 5′ and 3′ primers and poly-A tails were identified and used in the following analysis. Then, lima, implemented in IsoSeq. 3 from SMRTLink, was used to remove primers and identify barcodes. After trimming the poly-A tails and chimeric, cluster function in IsoSeq. 3 was used to produce full-length non-chimeric (FLNC) sequences. FLNC sequences were polished with arrow model in IsoSeq. 3 to generate high quality isoforms with an accuracy >99%. Redundancy was removed using CD-HIT-EST (version 4.7)21 with 99% sequence similarity threshold and transcripts shorter than 200 bp were filtered, resulting in a FL transcriptome (Fig. 1). Finally, by combining the three FL transcriptomes and removing redundant transcripts, we generated a FL transcriptome from both CF and FM bats (hereafter called FL-CF-FM). We assessed the completeness of each of the four FL transcriptomes by searching against single-copy orthologues (4,104 genes shared by 50 mammal species; http://busco.ezlab.org) using mammalia_odb9 BUSCO version 3.0.222.

Functional annotation

Each of the four FL transcriptomes was functionally annotated by performing a local BLASTx search against two protein databases, the Nr protein database (http://www.ncbi.nlm.nih.gov, accessed December 1, 2019) and UniProtKB (http://www.expasy.ch/sprot, accessed July 6, 2019), with an E-value of 1e-5.

Data Records

The raw FL sequencing data for each taxon have been deposited in the NCBI Sequence Read Archive (SRA) (Accession numbers: SRR1206284523, SRR1206284424 and SRR1206284325) (Table 1). The three FL transcriptomes from each of the three taxon have been deposited in the NCBI Transcriptome Shotgun Assembly (TSA) database (Accession numbers: GIRV0000000026, GIRW0000000027 and GIRX0000000028) (Table 1). The FL transcriptomes and functional annotation results for each of the four FL transcriptomes have been deposited in Figshare29.

Technical Validation

Quality control of the full-length transcriptomes

The FL transcriptomes for R. a. hainanus, R. a. himalayanus and Myotis ricketti were constructed based on sequencing data of three separated libraries on the PacBio Sequel platform. Specifically, a total of 3,444,947 subreads with 6,448,987,299 nucleotides, 3,255,638 subreads with 6,504,282,447 nucleotides and 3,403,451 subreads with 7,190,237,257 nucleotides were generated for R. a. hainanus, R. a. himalayanus and Myotis ricketti respectively. After quality control, we obtained 137,159 circular consensus sequencing (CCS) reads for R. a. hainanus, 137,160 CCS reads for R. a. himalayanus and 152,251 CCS reads for Myotis ricketti. With the standard IsoSeq. 3 classification and clustering pipeline, we identified 111,806 FLNC for R. a. hainanus, 105,713 FLNC for R. a. himalayanus and 122,222 FLNC for Myotis ricketti. After isoform-level polishing, 10384, 9984 and 10932 high quality isoforms were retained in R. a. hainanus, R. a. himalayanus and Myotis ricketti respectively. After removing redundancy with CD-HIT-EST and filtering isoforms shorter than 200 bp, the final FL transcriptomes for R. a. hainanus, R. a. himalayanus and Myotis ricketti (FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo, respectively) contain 10103, 9676 and 10504 FL isoforms with an average length of 2251, 2370 and 2530 bp, respectively (Table 2). Finally, the FL transcriptome from both CF and FM bats (FL-CF-FM) contains 26,342 transcripts with an average length of 2,405 bp (Table 2). BUSCO analysis revealed that a total of 2,354 (57.4%) BUSCOs were included in FL-CF-FM. We also found 39.9%, 38.1% and 41.9% BUSCOs in FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo, respectively (Table 4). Given the highly specialized function of the cochlea, we should not expect a high level of BUSCO value in FL transcriptome of cochlea. A recent single cell RNA-seq study has identified a similar number of genes expressed in the murine cochlea (a total of 12,944)30.

Table 4 Completeness of each of the four FL transcriptomes assessed by benchmarking universal single-copy ortholog (BUSCO) analysis.

Quality control of annotation

Four FL transcriptomes (FL-CF-Rhai, FL-CF-Rhim, FL-FM-Myo, and FL-CF-FM) were functionally annotated by performing DIAMOND and BLASTx searches against the Nr and UniProt databases separately. For FL-CF-FM, 24,793 and 24,198 transcripts were annotated by Nr database and UniProt database, respectively (Table 3). After combining the annotation results from the two databases, a total of 24,833 transcripts were annotated in at least one database. We obtained similar annotation results for FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo (Table 3). Transcripts without annotations might be novel isoforms of echolocating animals or due to the lack of representative sequences for cochlea in public databases.