Transcriptome sequencing of cochleae from constant-frequency and frequency-modulated echolocating bats

Ma, Lu; Sun, Haijian; Mao, Xiuguang

doi:10.1038/s41597-020-00686-w

Download PDF

Data Descriptor
Open access
Published: 13 October 2020

Transcriptome sequencing of cochleae from constant-frequency and frequency-modulated echolocating bats

Scientific Data volume 7, Article number: 341 (2020) Cite this article

1521 Accesses
4 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Echolocating bats are fascinating for their ability to ‘see’ the world in the darkness. Ultrahigh frequency hearing is essential for echolocation. In this study we collected cochlear tissues from constant-frequency (CF) bats (two subspecies of Rhinolophus affinis, Rhinolophidae) and frequency-modulated (FM) bats (Myotis ricketti, Vespertilionidae) and applied PacBio single-molecule real-time isoform sequencing (Iso-seq) technology to generate the full-length (FL) transcriptomes for the three taxa. In total of 10103, 9676 and 10504 non-redundant FL transcripts for R. a. hainanus, R. a. himalayanus and Myotis ricketti were obtained respectively. These data present a comprehensive list of transcripts involved in ultrahigh frequency hearing of echolocating bats including 26342 FL transcripts, 24833 of which are annotated by public databases. No further comparative analyses were performed on the current data in this study. This data can be reused to quantify gene or transcript expression, assess the level of alternative splicing, identify novel transcripts and improve genome annotation of bat species.

Measurement(s)	cochlea • transcriptome • sequence feature annotation
Technology Type(s)	isoform sequencing • sequence annotation
Factor Type(s)	species
Sample Characteristic - Organism	Rhinolophus affinis • Myotis ricketti

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12999614

Six reference-quality genomes reveal evolution of bat adaptations

Article Open access 22 July 2020

Comparative analysis of the daily brain transcriptomes of Asian particolored bat

Article Open access 09 March 2022

Whole-body transcriptome analysis provides insights into the cascade of sequential expression events involved in growth, immunity, and metabolism during the molting cycle in Scylla paramamosain

Article Open access 06 July 2022

Background & Summary

Most bats have evolved echolocation to navigate, explore environment and hunt prey in the darkness¹. All echolocating bats require ultrahigh frequency hearing for reception of ultrahigh frequency sounds, which is essential in the process of echolocation². High frequency hearing is also important for non-echolocating mammals, including human. However, the molecular mechanisms underlying the origin of high frequency hearing is still unknown³. Echolocating bats with ultrahigh frequency hearing provide a unique model for studying the molecular basis of high frequency hearing in mammals.

Modulation of gene expression and alternative mRNA splicing are two major forms of transcriptional regulation, responsible for the origin of novel phenotype and phenotypic diversity^4,5,6,7. Recently, high-throughput transcriptome sequencing (RNA-seq) of cochlear tissue has been used to uncover differentially expressed genes possibly associated with the origin of ultrahigh frequency hearing⁸, the divergence of different echolocating types⁹ and echolocation call frequency variation¹⁰. In these earlier studies, the reference used for quantification of gene expression was from a de novo assembly based on the short RNA-seq reads which may contain many artificial transcripts¹¹. The PacBio single-molecule real-time isoform sequencing (Iso-seq) can generate full-length (FL) sequences of all transcripts without the need for assembly¹², which has been integrated with RNA-seq for transcriptome quantification in multiple studies^12,13. PacBio Iso-seq is also used to detect alternative splicing events without the help of a reference genome sequence¹⁴ and to identify previously unannotated transcripts¹⁵. So far, no PacBio Iso-seq study has been conducted on the cochlear tissue of echolocating bats.

In this study we generated FL transcriptome datasets from the cochlear tissue of two kinds of echolocating bats using PacBio Iso-seq. Echolocating bats with ultrahigh frequency hearing (laryngeal echolocation) include constant-frequency (CF) bats and frequency-modulated (FM) bats¹⁶. We collected cochlear tissues from both CF and FM bats in order to get a comprehensive list of transcripts involved in ultrahigh frequency hearing (Table 1). We chose Rhinolophus affinis (Rhinolophidae) and Myotis ricketti (Vespertilionidae) as the representatives for CF and FM bats, respectively. To investigate the genetic basis of intraspecific echolocation call frequency variation in future, we included two Rhinolophus affinis subspecies (R. a. hainanus and R. a. himalayanus) which show divergent echolocation call frequencies^17,18. For clarity, the FL transcriptomes from the CF bats (R. a. hainanus and R. a. himalayanus) and FM bat (Myotis ricketti) were called FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo, respectively. After PacBio Iso-seq data processing, we obtained a total of 10103, 9676 and 10504 non-redundant FL transcripts for FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo respectively, ranging in size from 201 bp to 9740 bp (Table 2). The number of transcripts annotated in NCBI non-redundant protein sequences (Nr) and the UniprotKB database at least once is 9564, 9079 and 10090, respectively (Table 3). By combining the datasets from the three taxa we also generated a FL transcriptome of echolocating bats (FL-CF-FM) which contains 26342 FL transcripts with 24833 of them annotated in Nr or UniprotKB database (Tables 2 and 3).

Table 1 Detailed information about Iso-seq libraries.

Full size table

Table 2 Statistics of the four FL transcriptomes generated in this study.

Full size table

Table 3 Annotation statistics for each of the four FL transcriptomes.

Full size table

One limitation of this study is that we did not include biological replicates when generating the Iso-seq dataset for each taxon due to limited tissues available and a large amount of RNA required in PacBio Iso-seq library construction. Currently, the high cost for PacBio sequencing is another constraint to be considered. If the main aim of the study is to identify transcripts expressed in one or multiple tissues, as in most of current studies using FL transcriptome sequencing, it is unnecessary to include additional biological replicates. However, we pooled RNA from three individuals during library constructions of each of three echolocating bats in this study. By this way, we tried to avoid missing any transcripts due to degradation of RNA a specific individual and thus obtained a comprehensive list of transcripts expressed in cochlea.

The current FL transcriptomes generated in this study are sufficient to be reused in the several aspects. They can be used as the reference to reanalyze the RNA-seq datasets of cochlea in previous comparative transcriptomic studies^8,9,10. Quantification of transcript expression by mapping reads to the FL transcriptome will help to improve the accuracy of identifying differentially expressed transcripts¹². Moreover, by comparing with transcripts expressed in non-echolocating mammals, the current FL transcriptomes from echolocating bats will help to test whether alternative splicing plays an important role in the origin of novel phenotype (ultrahigh frequency hearing). In addition, FL transcriptomes from FM bats and two CF subspecies could be used to test the roles of alternative splicing in the divergence of different echolocating types (CF and FM) and in intraspecific echolocation call frequency variation. Finally, these FL transcriptome datasets will be useful for identification of novel transcripts and for improvement of genome annotation of Rhinolophus affinis,Myotis ricketti, and other bat species^19,20.

Methods

Sample collection and RNA preparation

We captured nine adult male bats from China including three Myotis ricketti from Jiangsu on April 19, 2018, three Rhinolophus affinis hainanus from Hainan on May 6, 2019, and three R. a. himalayanus from Anhui on January 4, 2019. Bats were rapidly euthanized by cervical dislocation, and cochleae were collected and transferred to RNase-free PCR tubes. Tissue samples were frozen immediately in liquid nitrogen and stored at −80 °C until RNA extraction. All sampling procedures were in accordance with the guidelines of Regulations for the Administration of Laboratory Animals approved by the Animal Ethics Committee of East China Normal University (ID no: bf20190301).

RNA from each tissue was extracted individually using Trizol reagent (Invitrogen, CA, USA) according to the manufacturer’s instructions. Poly-A mRNAs were harvested using oligo-dT attached magnetic beads. RNA concentration was assessed using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, USA), and RNA integrity number (RIN) values were assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA) (Fig. 1 and Table 1).

Library construction and full-length sequencing

RNA from three individuals of each taxon (R. a. hainanus, R. a. himalayanus and Myotis ricketti) were pooled to obtain enough amount of RNA (800–1000 ng) for PacBio Iso-seq library construction. We built one independent SMRTbell library for each taxon (a total of three libraries) with the PacBio DNA Template Prep Kit 3.0 according to the manufacturer’s instructions. SMRT sequencing was performed with the PacBio Sequel platform.

Generation of the full-length transcriptomes

PacBio Iso-seq raw data (subreads) from each taxon were analyzed using the SMRTLink software (v6.0). First, the circular consensus sequences (CCSs) were generated from subreads. The FL sequences with intact 5′ and 3′ primers and poly-A tails were identified and used in the following analysis. Then, lima, implemented in IsoSeq. 3 from SMRTLink, was used to remove primers and identify barcodes. After trimming the poly-A tails and chimeric, cluster function in IsoSeq. 3 was used to produce full-length non-chimeric (FLNC) sequences. FLNC sequences were polished with arrow model in IsoSeq. 3 to generate high quality isoforms with an accuracy >99%. Redundancy was removed using CD-HIT-EST (version 4.7)²¹ with 99% sequence similarity threshold and transcripts shorter than 200 bp were filtered, resulting in a FL transcriptome (Fig. 1). Finally, by combining the three FL transcriptomes and removing redundant transcripts, we generated a FL transcriptome from both CF and FM bats (hereafter called FL-CF-FM). We assessed the completeness of each of the four FL transcriptomes by searching against single-copy orthologues (4,104 genes shared by 50 mammal species; http://busco.ezlab.org) using mammalia_odb9 BUSCO version 3.0.2²².

Functional annotation

Each of the four FL transcriptomes was functionally annotated by performing a local BLASTx search against two protein databases, the Nr protein database (http://www.ncbi.nlm.nih.gov, accessed December 1, 2019) and UniProtKB (http://www.expasy.ch/sprot, accessed July 6, 2019), with an E-value of 1e-5.

Data Records

The raw FL sequencing data for each taxon have been deposited in the NCBI Sequence Read Archive (SRA) (Accession numbers: SRR12062845²³, SRR12062844²⁴ and SRR12062843²⁵) (Table 1). The three FL transcriptomes from each of the three taxon have been deposited in the NCBI Transcriptome Shotgun Assembly (TSA) database (Accession numbers: GIRV00000000²⁶, GIRW00000000²⁷ and GIRX00000000²⁸) (Table 1). The FL transcriptomes and functional annotation results for each of the four FL transcriptomes have been deposited in Figshare²⁹.

Technical Validation

Quality control of the full-length transcriptomes

The FL transcriptomes for R. a. hainanus, R. a. himalayanus and Myotis ricketti were constructed based on sequencing data of three separated libraries on the PacBio Sequel platform. Specifically, a total of 3,444,947 subreads with 6,448,987,299 nucleotides, 3,255,638 subreads with 6,504,282,447 nucleotides and 3,403,451 subreads with 7,190,237,257 nucleotides were generated for R. a. hainanus, R. a. himalayanus and Myotis ricketti respectively. After quality control, we obtained 137,159 circular consensus sequencing (CCS) reads for R. a. hainanus, 137,160 CCS reads for R. a. himalayanus and 152,251 CCS reads for Myotis ricketti. With the standard IsoSeq. 3 classification and clustering pipeline, we identified 111,806 FLNC for R. a. hainanus, 105,713 FLNC for R. a. himalayanus and 122,222 FLNC for Myotis ricketti. After isoform-level polishing, 10384, 9984 and 10932 high quality isoforms were retained in R. a. hainanus, R. a. himalayanus and Myotis ricketti respectively. After removing redundancy with CD-HIT-EST and filtering isoforms shorter than 200 bp, the final FL transcriptomes for R. a. hainanus, R. a. himalayanus and Myotis ricketti (FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo, respectively) contain 10103, 9676 and 10504 FL isoforms with an average length of 2251, 2370 and 2530 bp, respectively (Table 2). Finally, the FL transcriptome from both CF and FM bats (FL-CF-FM) contains 26,342 transcripts with an average length of 2,405 bp (Table 2). BUSCO analysis revealed that a total of 2,354 (57.4%) BUSCOs were included in FL-CF-FM. We also found 39.9%, 38.1% and 41.9% BUSCOs in FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo, respectively (Table 4). Given the highly specialized function of the cochlea, we should not expect a high level of BUSCO value in FL transcriptome of cochlea. A recent single cell RNA-seq study has identified a similar number of genes expressed in the murine cochlea (a total of 12,944)³⁰.

Table 4 Completeness of each of the four FL transcriptomes assessed by benchmarking universal single-copy ortholog (BUSCO) analysis.

Full size table

Quality control of annotation

Four FL transcriptomes (FL-CF-Rhai, FL-CF-Rhim, FL-FM-Myo, and FL-CF-FM) were functionally annotated by performing DIAMOND and BLASTx searches against the Nr and UniProt databases separately. For FL-CF-FM, 24,793 and 24,198 transcripts were annotated by Nr database and UniProt database, respectively (Table 3). After combining the annotation results from the two databases, a total of 24,833 transcripts were annotated in at least one database. We obtained similar annotation results for FL-CF-Rhai, FL-CF-Rhim and FL-FM-Myo (Table 3). Transcripts without annotations might be novel isoforms of echolocating animals or due to the lack of representative sequences for cochlea in public databases.

Code availability

The software versions and parameters used in this study are described below.1.SMRTlink: version 6.0, parameters: pbccs.task_options.max_length = 20000 pbccs.task_options.min_length = 300.2.CD-Hit-Est: version 4.7, parameters: -c 0.99 -T 20 -G 0 -aL 0.90 -AL 100 -aS 0.98 -AS 30 -M 0 -d 0.3.BUSCO: version 3.0.2, default parameters. -m tran -e 1e-05.4.BLASTx: version 2.2.29+, parameters: -outfmt 6, -e value 1e-5 --max-target-seqs 1.5.DIAMOND: version 0.9.24.125.

References

Schnitzler, H. U., Moss, C. F. & Denzinger, A. From spatial orientation to food acquisition in echolocating bats. Trends Ecol Evol 18, 386–394, https://doi.org/10.1016/S0169-5347(03)00185-X (2003).
Article Google Scholar
Teeling, E. C., Jones, G. & Rossiter, S. J. In Bat Bioacoustics (eds M. Brock Fenton, Alan D. Grinnell, Arthur N. Popper, & Richard R. Fay) 25–54 (Springer New York, 2016).
Pisciottano, F. et al. Inner ear genes underwent positive selection and adaptation in the mammalian lineage. Mol Biol Evol 36, 1653–1670, https://doi.org/10.1093/molbev/msz077 (2019).
Article CAS PubMed Google Scholar
Harrison, P. W., Wright, A. E. & Mank, J. E. The evolution of gene expression and the transcriptome-phenotype relationship. Semin Cell Dev Biol 23, 222–229, https://doi.org/10.1016/j.semcdb.2011.12.004 (2012).
Article CAS PubMed Google Scholar
Martin, A. & Orgogozo, V. The Loci of Repeated Evolution: A catalog of genetic hotspots of phenotypic variation. Evolution 67, 1235–1250, https://doi.org/10.1111/evo.12081 (2013).
Article CAS PubMed Google Scholar
Singh, P., Borger, C., More, H. & Sturmbauer, C. The role of alternative splicing and differential gene expression in cichlid adaptive radiation. Genome Biol Evol 9, 2764–2781, https://doi.org/10.1093/gbe/evx204 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bush, S. J., Chen, L., Tovar-Corona, J. M. & Urrutia, A. O. Alternative splicing and the evolution of phenotypic novelty. Philos Trans R Soc Lond B Biol Sci 372, 20150474, https://doi.org/10.1098/rstb.2015.0474 (2017).
Article PubMed PubMed Central Google Scholar
Dong, D., Lei, M., Liu, Y. & Zhang, S. Comparative inner ear transcriptome analysis between the Rickett’s big-footed bats (Myotis ricketti) and the greater short-nosed fruit bats (Cynopterus sphinx). BMC Genomics 14, 916, https://doi.org/10.1186/1471-2164-14-916 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wang, H., Zhao, H., Huang, X., Sun, K. & Feng, J. Comparative cochlear transcriptomics of echolocating bats provides new insights into different nervous activities of CF bat species. Sci Rep 8, 15934, https://doi.org/10.1038/s41598-018-34333-7 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhao, H. et al. Gene expression vs. sequence divergence: comparative transcriptome sequencing among natural Rhinolophus ferrumequinum populations with different acoustic phenotypes. Front Zool 16, 37, https://doi.org/10.1186/s12983-019-0336-7 (2019).
Article CAS PubMed PubMed Central Google Scholar
Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat Rev Genet 20, 631–656, https://doi.org/10.1038/s41576-019-0150-2 (2019).
Article CAS PubMed Google Scholar
Gao, Y. B. et al. Single-molecule Real-time (SMRT) Isoform Sequencing (Iso-Seq) in Plants: The status of the bioinformatics tools to unravel the transcriptome complexity. Curr Bioinform 14, 566–573, https://doi.org/10.2174/1574893614666190204151746 (2019).
Article CAS Google Scholar
Hu, Z. et al. Full-Length transcriptome assembly of Italian ryegrass root integrated with RNA-seq to identify genes in response to plant cadmium stress. Int J Mol Sci 21, 1067, https://doi.org/10.3390/ijms21031067 (2020).
Article CAS PubMed Central Google Scholar
Liu, X., Mei, W., Soltis, P. S., Soltis, D. E. & Barbazuk, W. B. Detecting alternatively spliced transcript isoforms from single-molecule long-read sequences without a reference genome. Mol Ecol Resour 17, 1243–1256, https://doi.org/10.1111/1755-0998.12670 (2017).
Article CAS PubMed Google Scholar
Thomas, S., Underwood, J. G., Tseng, E., Holloway, A. K. & Informatics, B. B. C. Long-Read sequencing of chicken transcripts and identification of new transcript isoforms. Plos One 9, e94650, https://doi.org/10.1371/journal.pone.0094650 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Jones, G. & Teeling, E. C. The evolution of echolocation in bats. Trends Ecol Evol 21, 149–156, https://doi.org/10.1016/j.tree.2006.01.001 (2006).
Article PubMed Google Scholar
Mao, X. et al. Historical introgression and the persistence of ghost alleles in the intermediate horseshoe bat (Rhinolophus affinis). Mol Ecol 22, 1035–1050, https://doi.org/10.1111/mec.12154 (2013).
Article CAS PubMed Google Scholar
Mao, X., Zhu, G., Zhang, L., Zhang, S. & Rossiter, S. J. Differential introgression among loci across a hybrid zone of the intermediate horseshoe bat (Rhinolophus affinis). BMC Evol Biol 14, 154, https://doi.org/10.1186/1471-2148-14-154 (2014).
Article PubMed PubMed Central Google Scholar
Teeling, E. C. et al. Bat biology, genomes, and the Bat1K project: to generate chromosome-level for all living bat species. Annu Rev Anim Biosci 6, 23–46, https://doi.org/10.1146/annurev-animal-022516-022811 (2018).
Article PubMed Google Scholar
Jebb, D. et al. Six reference-quality genomes reveal evolution of bat adaptations. Nature 583, 578–584, https://doi.org/10.1038/s41586-020-2486-3 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659, https://doi.org/10.1093/bioinformatics/btl158 (2006).
Article CAS PubMed Google Scholar
Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35, 543–548, https://doi.org/10.1093/molbev/msx319 (2018).
Article CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR12062845 (2020).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR12062844 (2020).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR12062843 (2020).
Ma, L. et al. TSA: PacBio full-length transcriptome sequencing from cochleae of two echolocating bats. GenBank https://identifiers.org/ncbi/insdc:GIRV00000000.1 (2020).
Ma, L. et al. TSA: PacBio full-length transcriptome sequencing from cochleae of two echolocating bats. GenBank https://identifiers.org/ncbi/insdc:GIRW00000000.1 (2020).
Ma, L. et al. TSA: PacBio full-length transcriptome sequencing from cochleae of two echolocating bats. GenBank https://identifiers.org/ncbi/insdc:GIRX00000000.1 (2020).
Ma, L. et al. PacBio full-length transcriptome sequencing from cochleae of two echolocating bats. figshare https://doi.org/10.6084/m9.figshare.c.5043656 (2020).
Ranum, P. T. et al. Insights into the biology of hearing and deafness revealed by single-cell RNA sequencing. Cell Rep 26, 3160–3171, https://doi.org/10.1016/j.celrep.2019.02.053 (2019).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Jiaying Wang, Yuting Ding and Wenli Chen for assistances with sample collection. This work was supported by the National Natural Science Foundation of China (No. 31570378).

Author information

Lu Ma
Present address: Changsha Central Hospital, University of South China, Changsha, 410011, China

Authors and Affiliations

School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200062, China
Lu Ma, Haijian Sun & Xiuguang Mao
Institute of Eco-Chongming (IEC), East China Normal University, Shanghai, 200062, China
Xiuguang Mao

Authors

Lu Ma
View author publications
You can also search for this author in PubMed Google Scholar
Haijian Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiuguang Mao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.M. and H.J.S. analyzed data. L.M. wrote the manuscript. We would like to thank Mr. Duncan K. Gichuki from Wuhan Botanical Garden, Chinese Academy of Sciences for improving English. X.G.M. conceived and supervised the project, and revised the manuscript.

Corresponding author

Correspondence to Xiuguang Mao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Ma, L., Sun, H. & Mao, X. Transcriptome sequencing of cochleae from constant-frequency and frequency-modulated echolocating bats. Sci Data 7, 341 (2020). https://doi.org/10.1038/s41597-020-00686-w

Download citation

Received: 07 July 2020
Accepted: 14 September 2020
Published: 13 October 2020
DOI: https://doi.org/10.1038/s41597-020-00686-w