Transcriptome sequencing of cochleae from constant-frequency and frequency-modulated echolocating bats

Echolocating bats are fascinating for their ability to ‘see’ the world in the darkness. Ultrahigh frequency hearing is essential for echolocation. In this study we collected cochlear tissues from constant-frequency (CF) bats (two subspecies of Rhinolophus affinis, Rhinolophidae) and frequency-modulated (FM) bats (Myotis ricketti, Vespertilionidae) and applied PacBio single-molecule real-time isoform sequencing (Iso-seq) technology to generate the full-length (FL) transcriptomes for the three taxa. In total of 10103, 9676 and 10504 non-redundant FL transcripts for R. a. hainanus, R. a. himalayanus and Myotis ricketti were obtained respectively. These data present a comprehensive list of transcripts involved in ultrahigh frequency hearing of echolocating bats including 26342 FL transcripts, 24833 of which are annotated by public databases. No further comparative analyses were performed on the current data in this study. This data can be reused to quantify gene or transcript expression, assess the level of alternative splicing, identify novel transcripts and improve genome annotation of bat species.


Background & Summary
Most bats have evolved echolocation to navigate, explore environment and hunt prey in the darkness 1 . All echolocating bats require ultrahigh frequency hearing for reception of ultrahigh frequency sounds, which is essential in the process of echolocation 2 . High frequency hearing is also important for non-echolocating mammals, including human. However, the molecular mechanisms underlying the origin of high frequency hearing is still unknown 3 . Echolocating bats with ultrahigh frequency hearing provide a unique model for studying the molecular basis of high frequency hearing in mammals.
Modulation of gene expression and alternative mRNA splicing are two major forms of transcriptional regulation, responsible for the origin of novel phenotype and phenotypic diversity [4][5][6][7] . Recently, high-throughput transcriptome sequencing (RNA-seq) of cochlear tissue has been used to uncover differentially expressed genes possibly associated with the origin of ultrahigh frequency hearing 8 , the divergence of different echolocating types 9 and echolocation call frequency variation 10 . In these earlier studies, the reference used for quantification of gene expression was from a de novo assembly based on the short RNA-seq reads which may contain many artificial transcripts 11 . The PacBio single-molecule real-time isoform sequencing (Iso-seq) can generate full-length (FL) sequences of all transcripts without the need for assembly 12 , which has been integrated with RNA-seq for transcriptome quantification in multiple studies 12,13 . PacBio Iso-seq is also used to detect alternative splicing events without the help of a reference genome sequence 14 and to identify previously unannotated transcripts 15 . So far, no PacBio Iso-seq study has been conducted on the cochlear tissue of echolocating bats.
In this study we generated FL transcriptome datasets from the cochlear tissue of two kinds of echolocating bats using PacBio Iso-seq. Echolocating bats with ultrahigh frequency hearing (laryngeal echolocation) include constant-frequency (CF) bats and frequency-modulated (FM) bats 16 . We collected cochlear tissues from both CF and FM bats in order to get a comprehensive list of transcripts involved in ultrahigh frequency hearing (Table 1). We chose Rhinolophus affinis (Rhinolophidae) and Myotis ricketti (Vespertilionidae) as the representatives for CF and FM bats, respectively. To investigate the genetic basis of intraspecific echolocation call frequency variation in future, we included two Rhinolophus affinis subspecies (R. a. hainanus and R. a. himalayanus) which show divergent echolocation call frequencies 17,18 . For clarity, the FL transcriptomes from the CF bats (R. a. hainanus and R. a. himalayanus) and FM bat (Myotis ricketti) were called FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo, respectively. After PacBio Iso-seq data processing, we obtained a total of 10103, 9676 and 10504 non-redundant FL transcripts for FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo respectively, ranging in size from 201 bp to 9740 bp ( Table 2). The number of transcripts annotated in NCBI non-redundant protein sequences (Nr) and the UniprotKB database at least once is 9564, 9079 and 10090, respectively (Table 3). By combining the datasets from the three taxa we also generated a FL transcriptome of echolocating bats (FL-CF-FM) which contains 26342 FL transcripts with 24833 of them annotated in Nr or UniprotKB database (Tables 2 and 3).
One limitation of this study is that we did not include biological replicates when generating the Iso-seq dataset for each taxon due to limited tissues available and a large amount of RNA required in PacBio Iso-seq library construction. Currently, the high cost for PacBio sequencing is another constraint to be considered. If the main aim of the study is to identify transcripts expressed in one or multiple tissues, as in most of current studies using FL transcriptome sequencing, it is unnecessary to include additional biological replicates. However, we pooled RNA from three individuals during library constructions of each of three echolocating bats in this study. By this way, we tried to avoid missing any transcripts due to degradation of RNA a specific individual and thus obtained a comprehensive list of transcripts expressed in cochlea.
The current FL transcriptomes generated in this study are sufficient to be reused in the several aspects. They can be used as the reference to reanalyze the RNA-seq datasets of cochlea in previous comparative transcriptomic studies [8][9][10] . Quantification of transcript expression by mapping reads to the FL transcriptome will help to improve the accuracy of identifying differentially expressed transcripts 12 . Moreover, by comparing with transcripts expressed in non-echolocating mammals, the current FL transcriptomes from echolocating bats will help to test whether alternative splicing plays an important role in the origin of novel phenotype (ultrahigh frequency hearing). In addition, FL transcriptomes from FM bats and two CF subspecies could be used to test the roles of alternative splicing in the divergence of different echolocating types (CF and FM) and in intraspecific echolocation call frequency variation. Finally, these FL transcriptome datasets will be useful for identification of novel transcripts and for improvement of genome annotation of Rhinolophus affinis,Myotis ricketti, and other bat species 19,20 .   Generation of the full-length transcriptomes. PacBio Iso-seq raw data (subreads) from each taxon were analyzed using the SMRTLink software (v6.0). First, the circular consensus sequences (CCSs) were generated from subreads. The FL sequences with intact 5′ and 3′ primers and poly-A tails were identified and used in the following analysis. Then, lima, implemented in IsoSeq. 3 from SMRTLink, was used to remove primers and identify barcodes. After trimming the poly-A tails and chimeric, cluster function in IsoSeq. 3 was used to produce full-length non-chimeric (FLNC) sequences. FLNC sequences were polished with arrow model in IsoSeq. 3 to generate high quality isoforms with an accuracy >99%. Redundancy was removed using CD-HIT-EST (version 4.7) 21 with 99% sequence similarity threshold and transcripts shorter than 200 bp were filtered, resulting in a FL transcriptome (Fig. 1)

Data Records
The raw FL sequencing data for each taxon have been deposited in the NCBI Sequence Read Archive (SRA) (Accession numbers: SRR12062845 23 (Table 2). BUSCO analysis revealed that a total of 2,354 (57.4%) BUSCOs were included in FL-CF-FM. We also found 39.9%, 38.1% and 41.9% BUSCOs in FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo, respectively (Table 4). Given the highly specialized function of the cochlea, we should not expect a high level of BUSCO value in FL transcriptome of cochlea. A recent single cell RNA-seq study has identified a similar number of genes expressed in the murine cochlea (a total of 12,944) 30 (Table 3). After combining the annotation results from the two databases, a total of 24,833 transcripts were annotated in at least one database. We obtained similar annotation results for FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo (Table 3). Transcripts without annotations might be novel isoforms of echolocating animals or due to the lack of representative sequences for cochlea in public databases.

Code availability
The software versions and parameters used in this study are described below.   Table 4. Completeness of each of the four FL transcriptomes assessed by benchmarking universal single-copy ortholog (BUSCO) analysis.