Six new reference-quality bat genomes illuminate the molecular basis and evolution of bat adaptations

Bats account for ~20% of all extant mammal species and are considered exceptional given their extraordinary adaptations, including biosonar, true flight, extreme longevity, and unparalleled immune systems. To understand these adaptations, we generated reference-quality genomes of six species representing the key divergent lineages. We assembled these genomes with a novel pipeline incorporating state-of-the-art long-read and long-range sequencing and assembly techniques. The genomes were annotated using a maximal evidence approach, de novo predictions, protein/mRNA alignments, Iso-seq long read and RNA-seq short read transcripts, and gene projections from our new TOGA pipeline, retrieving virtually all (>99%) mammalian BUSCO genes. Phylogenetic analyses of 12,931 protein coding-genes and 10,857 conserved non-coding elements identified across 48 mammalian genomes helped to resolve bats’ closest extant relatives within Laurasiatheria, supporting a basal position for bats within Scrotifera. Genome-wide screens along the bat ancestral branch revealed (a) selection on hearing-involved genes (e.g LRP2, SERPINB6, TJP2), which suggest that laryngeal echolocation is a shared ancestral trait of bats; (b) selection (e.g INAVA, CXCL13, NPSR1) and loss of immunity related proteins (e.g. LRRC70, IL36G), including pro-inflammatory NF-kB signalling; and (c) expansion of the APOBEC family, associated with restricting viral infection, transposon activity and interferon signalling. We also identified unique integrated viruses, indicating that bats have a history of tolerating viral pathogens, lethal to other mammal species. Non-coding RNA analyses identified variant and novel microRNAs, revealing regulatory relationships that may contribute to phenotypic diversity in bats. Together, our reference-quality genomes, high-quality annotations, genome-wide screens and in-vitro tests revealed previously unknown genomic adaptations in bats that may explain their extraordinary traits.

To obtain genome assemblies of high contiguity and completeness, we developed novel 108 pipelines incorporating state-of-the-art sequencing and assembly. To ascertain the position of 109 Chiroptera within Laurasiatheria and thus resolve a long-standing phylogenetic debate 14 , we 110 mined these near complete genomes to produce a comprehensive orthologous gene data set 111 (12,931), including data from 42 other representative mammalian genomes (TableS1), and 112 applied a suite of diverse phylogenetic approaches. To identify molecular changes in regions 113 of the genome -both coding and non-coding -that underlie bat adaptations, we carried out 114 selection tests, analysed gains and losses of genes, and experimentally validated novel bat 115 microRNAs. We focussed on assessing the shared commonalities between the bat species 116 enabling us to infer the ancestral selection driving key bat adaptations. We elucidated the 117 diversity of endogenous viruses contained within the bat genomes, exploring bats' putative 118 history with these viruses. Herein, we present the first six reference-quality bat genomes, which 119 we make available in the open access Bat1K browser (also available on NCBI and GenomeArk) 120 and demonstrate both the value of highly contiguous and highly complete genomes and the 121 utility of bats as model organisms to address fundamental questions in biology. PacBio reads were assembled into contigs with a customized assembler called DAmar, a hybrid 128 of our earlier Marvel 18 , Dazzler (https://dazzlerblog.wordpress.com/), and Daccord 19,20 systems 129 ( Fig. 1a). Next, we used PacBio reads and 10x read cloud Illumina data to remove base errors, 130 which was followed by identifying and phasing all regions of the contigs that had a sufficient 131 rate of haplotype heterogeneity. We retained one haplotype for each region, yielding primary 132 contigs. These primary contigs were then scaffolded using a Bionano optical map and the Hi-C 133 data (see supplementary methods section 2). 134 135 For all six bats, this sequencing and assembly strategy produced assemblies with contig N50 136 values ranging from 10.6 to 22.2 Mb (Fig. 1b, Table S2). Thus, our contigs are ≥355 times 137 more contiguous than the recent Miniopterus assembly generated from short read data 21 , and 138 ≥7 times more contiguous than a previous Rousettus assembly generated from a hybrid of short 139 and long read data 22 (Fig. 1b). Our scaffold N50 values ranged from 80.2 to 171.1 Mb and were 140 often limited by the size of chromosomes (Fig. 1b, Table S2). We estimated that 87 to 99% of 141 each assembly is in chromosome-level scaffolds (Table S3). Consensus base accuracies across 142 the entire assembly range from QV 40.8 to 46.2 (Table S2) for the six bats (where QV 40 143 represents 1 error in 10,000 bp). Since the algorithms for assembling, scaffolding, and 144 haplotyping are an active area of research 23 , we expect that in the future even more complete 145 genome reconstructions can be produced with the data we collected. Even so, our current 146 strategy and algorithms generated chromosome-level assemblies of the six bats with 147 unprecedented contiguity, which are comparable to the best reference-quality genomes 148 currently generated for any eukaryotic species with a complex, multi-gigabyte genome 24 . 149 Importantly, they meet the Vertebrate Genome Project (VGP) minimum standard of 3.4.2QV40 150 and have been added to the VGP collection. 151

152
To assess genome completeness, we first evaluated the presence of 4,104 genes that are highly 153 conserved among mammals (BUSCO, Benchmarking Universal Single-Copy Orthologs 25 ). 154 Between 92.9 and 95.8% of these genes were completely present in our assemblies, which is 155 comparable to the assemblies of human, mouse, and other Laurasiatheria (Fig. 1c, Table S4). 156 Second, to assess completeness in non-exonic genomic regions, we determined how many of 157 197 non-exonic ultraconserved elements (UCEs) 26 align at ≥85% identity to the human 158 sequence. As expected, the vast majority of UCEs were detected in all assemblies (Fig. 1d). 159 Two to four UCEs were not detected in Miniopterus, dog, cat, and cow due to assembly 160 incompleteness (i.e. assembly gaps; Table S5, Fig. S1). In the bat genomes reported herein, no 161 UCEs were missing due to assembly incompleteness. Instead, one to three UCEs were not 162 detected in our Myotis and Pipistrellus assemblies because the UCE sequences are more than 163 85% diverged (Table S5), a striking result given that UCE's are highly conserved across other 164 more divergent mammals (e.g. human-mouse-rat comparison). To determine if this sequence 165 divergence was caused by base errors in the assemblies, we aligned raw PacBio and Illumina 166 reads and sequencing data of related bats, which confirmed that these UCEs are truly diverged 167 (Figs. S1-S5). In summary, our six bat assemblies are highly complete and revealed the first 168 examples of highly diverged UCEs. 169 170

Genome Annotation 171
To comprehensively annotate genes, we integrated a variety of evidence (Fig. 1e). First, we 172 aligned protein and cDNA sequences of a related bat species to each of our six genomes (Table  173 S6). Second, we projected genes annotated in human, mouse 27 and two bat assemblies (Myotis 174 lucifugus (Ensembl) and Myotis myotis (Bat1K)) to our genomes via whole-genome 175 alignments 28 . Third, we generated de novo gene predictions by applying Augustus 29 with a 176 trained bat-specific gene model in single-genome mode to individual genomes, and in 177 comparative mode to a multiple genome alignment including our bat assemblies. Fourth, we 178 integrated transcriptomic data from both publicly available data sources and our own Illumina 179 short read RNA-seq data (Table S7). Additionally, we generated PacBio long read RNA 180 sequences (Iso-seq) from all six species to capture full-length isoforms and accurately annotate 181 untranslated regions (UTRs) (Table S8). Iso-seq data were processed using the TAMA 182 pipeline 30 which allowed capturing a substantially greater diversity of transcripts and isoforms 183 than the default pipeline (https://github.com/PacificBiosciences/IsoSeq3). All transcriptomic, 184 homology-based and ab initio evidence were integrated into a consensus gene annotation that 185 we further enriched for high-confidence transcript variants and filtered for strong coding 186 potential. 187

188
For the six bats, we annotated between 19,122 and 21,303 coding genes (Fig. 1f) genes. Importantly, this gene annotation completeness of our bats is higher than the Ensembl 197 gene annotations of dog, cat, horse, cow and pig, and is only surpassed by the gene annotations 198 of human and mouse, which have received extensive manual curation of gene models (Fig. 1f, 199 At ~2 Gb, bat genomes are generally smaller than genomes of other placental mammals that 207 are typically between 2.5 and 3.5 Gb 2 . Nevertheless, our assemblies revealed noticeable 208 genome size differences within bats, with assembly sizes ranging from 1.78 Gb for Pipistrellus 209 to 2.32 Gb for Molossus (Fig. 1g). As genome size is often correlated with transposable element 210 (TE) content and activity, we focused on the genomes of the six bats and seven other 211 representative Boreoeutherian mammals (Laurasiatheria + Euarchontoglires), selected for the 212 highest genome contiguity, and used a previously-described workflow and manual curation to 213 annotate TEs 31 . This showed that TE content generally correlates with genome size (Fig. 1g). 214 Next, we compared TE copies to their consensus sequence to obtain a relative age from each 215 length, which contained 7,913,054 parsimony-informative sites. The best-fit model of sequence 245 evolution for each of the 12,931 gene alignments was inferred using ModelFinder 36 (Table S9). 246 The species tree was then estimated by maximum likelihood using the model-partitioned 247 dataset with IQTREE 37 and rooted on Atlantogenata 38 . Branch-support values were obtained 248 by UFBoot 39 with 1000 bootstrap pseudoreplicates. These analyses led to 100% bootstrap 249 support across the entire tree (Fig. 2a) and seemingly identified the origin of bats within 250 Laurasiatheria. The basal split is between Eulipotyphla and other laurasiatherians (i.e., 251 Scrotifera). Within Scrotifera, Chiroptera is the sister clade to Fereuungulata (Cetartiodactyla 252 + Perissodactyla + Carnivora + Pholidota). This tree disagrees with the Pegasoferae 253 hypothesis 40 , which groups bats with Perissodactyla, Carnivora and Pholidota, but agrees with 254 concatenation analyses of phylogenomic data 41 . Evolutionary studies based on 102 255 retroposons, including ILS-aware analyses, also support a sister-group relationship between 256 Chiroptera and Fereuungulata, but differ from the present analyses in supporting a sister-group 257 relationship between Carnivora and Cetartiodactyla 34,35 . However, as the number of 258 homologous sites increases in phylogenomic datasets, so too does bootstrap support 42 , even 259 sometimes for an incorrect phylogeny 43 , and as non-coding sequences can produce a different 260 topology than coding sequences 44 , we further explored the phylogenomic signal within our 261 genomes. 262

263
To assess whether the tree inferred from the concatenated dataset ( Fig. 2a) is also supported by 264 the non-coding part of the genome, we estimated a phylogeny using the models of best fit 265 (Table S9) for a dataset comprising 10,857 orthologous conserved non-coding elements 266 (CNEs), which contained 5,234,049 nucleotides and 1,225,098 parsimony-informative sites 267 (Table S10), using methods as described above. The result of this analysis (Fig. 2b) supports a 268 tree similar but not identical to that inferred from the protein-coding sequences (Fig. 2a), 269 including a sister-group relationship between Chiroptera and Fereuungulata, but with 270 Perissodactyla more closely related to Carnivora + Pholidota than to Cetartiodactyla. The CNE 271 tree also recovered a different position for Tupaia (Scandentia) within Euarchontoglires. 272 273 Given that two very short branches at the base of Scrotifera define relationships between its 274 four major clades (Carnivora + Pholidota, Cetartiodactyla, Chiroptera, Perissodactyla), this 275 region of the placental tree may be in the "anomaly zone", defined as a region of tree space 276 where the most common gene tree(s) differs from the species tree topology 45 . In the case of 277 four taxa and a rooted pectinate species tree, anomalous gene trees should be symmetric rather 278 than pectinate. To explore this, we estimated the maximum-likelihood support of each protein-279 coding gene (n=12,931) for the 15 possible bifurcating topologies involving four clades, in our 280 case with Eulipotyphla as the outgroup (Fig. S6), and with the sub-trees for the relevant clades 281 identical to those in Fig. 2a. Based on the log-likelihood scores, 2,104 gene alignments 282 supported more than one tree, so these genes were excluded from further analysis. The 283 remaining 10,827 genes supported one fixed tree topology over the other 14 (Table S11) which is also supported by the CNEs (Fig. 2b). This suggests that the majority of the genome 288 supports a sister relationship between Chiroptera and the other Scrotifera. That said, there were 289 four other topologies that had support from >800 genes (Tree14 883/10827; Tree04 865/10827; 290 Tree15 820/10827; Tree13 806/10827) (Fig. 2c). However, even with similar support levels 291 for several topologies, the phylogenetic position for Chiroptera is pectinate on the most 292 common gene tree and does not qualify as anomalous. If the base of Scrotifera is in the anomaly 293 zone, as suggested by coalescence analyses of retroposon insertions 35 , then we may expect the 294 most common gene tree(s) to be symmetric rather than pectinate. We may also expect the 295 species tree based on concatenation to be symmetric instead of pectinate 45 . One explanation for 296 the absence of anomalous gene trees, and for a pectinate species tree based on concatenation, 297 is that both protein-coding genes and CNEs are generally under purifying selection, which 298 reduces both coalescence times and incomplete lineage sorting relative to neutrally evolving 299 loci 46,47 . 300 301 Bias in phylogenetic estimates can also be due to model misspecification, which is an 302 inadequate fit between phylogenetic data and the model of sequence evolution used 48 . 303 Misleading support for incorrect phylogenies can also be due to gene tree error arising from a 304 lack of phylogenetic informativeness amongst data partitions 49 . To overcome these biases, we 305 performed a series of compatibility analyses on each gene partition and across the supermatrix 306 at 1 st , 2 nd and 3 rd codon sites; 1 st + 2 nd codon sites; 1 st + 2 nd + 3 rd codon sites; amino acids, 307 assuming a 4-state alphabet for nucleotides and a 20 state-alphabet for amino acids (see 308 supplementary methods section 4.2). We excluded all alignments for which evidence of 309 saturation of substitutions and thus decay of the historical signal was detected by SatuRation 48 taxa, were considered optimal for phylogenetic analysis (Table S12). We concatenated these 317 data into a supermatrix of 241,098 nucleotides in length with 37,588 informative positions and 318 completed all phylogenetic analyses using methods as described above. However, this reduced 319 data set did not provide an unambiguous phylogenetic estimate. Specifically, while the best-320 supported topology differed from the best trees inferred using all protein-coding genes and 321 CNEs in its position of Chiroptera, which is now sister to Carnivora + Pholidota (Fig. 2d), this 322 node has low bootstrap support (58%; topology 13; Fig. 2d) and Approximately Unbiased (AU) 323 tests could not reject the topologies depicted in Fig. 2a and 2b. Furthermore, the phylogeny 324 inferred from the subset of 488 genes is also symmetric for the four major lineages of 325 Scrotifera, as may be expected if this node is in the anomaly zone and concatenation is 326 misleading. We further analysed these data using a single-site coalescence-based method, 327 SVDquartets 51,52 , which provides an alternative to concatenation. The resulting optimal 328 topology also supported Chiroptera as sister taxa to Fereuungulata (Fig. S7, topology 1), which 329 is the most supported position from all of our analyses and data partitions. Genome-wide screens for gene selection, losses and gains 339 To study the genomic basis of exceptional traits shared by bats, we first performed three 340 unbiased genome-wide screens for gene changes that occurred in the six bats. First, we 341 screened 12,931 genes classified as 1:1 orthologs for signatures of positive selection on the 342 ancestral bat (stem Chiroptera) branch under the aBSREL model 53 using HyPhy 54 and the 343 best-supported phylogeny (Fig. 2a). For genes with significant evidence for selection after 344 multiple test correction (FDR<0.05), we manually inspected the underlying alignment to 345 ensure homology (supplementary methods section 4.3.1), and additionally required that the 346 branch-site test implemented in PAML codeml 55 independently verified positive selection 347 (P<0.05). This revealed 9 genes with a robust signal of positive selection at the bat ancestor 348 (Table S13). While these 9 genes have diverse functions, they included two genes with hearing-349 related functions, which may relate to the evolution of echolocation. These genes, LRP2 (low-350 density lipoprotein receptor-related protein 2, also called megalin) and SERPINB6 relevant for prominent bat traits (i.e. longevity, immunity, metabolism 2 ) we further performed 400 a screen considering 2,453 candidate genes (Table S14) (Table S15, Fig. S10). These genes include IL17D 406 and IL-1β, which are involved in immune system regulation 70 and NF-kB activation (IL-407 1β) 66,71 , and GP2 and LCN2, which are involved in the response to pathogens 72,73 . Interestingly, 408 selection was also inferred for PURB, a gene that plays a role in cell proliferation and regulates 409 the oncogene MYC 74 , which was previously shown to be under divergent selection in bats 16 and 410 which exhibits a unique anti-ageing transcriptomic profile in long lived Myotis bats 8 . Overall, 411 combining genome-wide and candidate gene screens revealed robust patterns of selection in 412 stem Chiroptera on several genes involved in immunity and infection, which suggests that 413 ancestral bats evolved immunomodulatory mechanisms that enabled a higher tolerance to 414 pathogens. 415 416 Second, we used a previously developed approach 75 to systematically screen for gene loss. This 417 revealed 10 genes that are inactivated in our six bats but present in the majority of 418 Laurasiatheria (Table S16). Two of these genes again point to changes in immune function in 419 bats, having immune-stimulating and pro-inflammatory functions; LRRC70 (leucine rich repeat 420 containing 70, also called synleurin) and IL36G (interleukin 36 gamma) (Fig. 3a).  (Table S17). Among these, we inferred 438 an expansion of the APOBEC gene family. Expansion involved APOBEC3-type genes (Fig.  439 3c) and supported a small expansion in the ancestral bat lineage, followed by up to 14 440 duplication events within Chiroptera. The APOBEC3 locus is highly-dynamic, with a complex  (Fig. 4b & Fig. S13). 475 Overall, the highest number of integrations was observed in M. myotis (n=630), followed by 476 Rousettus (n=334) with Phyllostomus containing the lowest (n=126; Fig. 4b, Table S18). 477 Additionally, we detected ERV sequences with hits for alpha-and lenti-retroviruses in 478 reciprocal BLAST searches. Until now, alpharetroviruses were considered as exclusively 479 endogenous avian viruses 92 . Thus, our discovery of endogenous alpharetroviral-like elements 480 in bats is the first record of these sequences in mammalian genomes, widening the known 481 biodiversity of potential hosts for retrovirus transmission. We detected several alpha-like env 482 regions in Phyllostomus, Rhinolophus, and Rousettus (Fig. 4b), showing that multiple and 483 diverse bat species have been and possibly are being infected by alpharetroviruses. We also 484 detected lentivirus gag-like fragments in Pipistrellus, which are rarely observed in endogenized 485 form 93 . 486

487
To identify historical ancestral transmission events, we reconstructed a phylogenetic tree from 488 our recovered ERVs with the known viral protein 'probe' sequences for all six bat genomes 489 and seven mammalian outgroups (Fig. S14). The majority of sequences group as single bat-490 species clusters, suggesting that relatively recent integration events, more than ancestral 491 transmission (Fig. S14)  Overall these results show that bat genomes contain a surprising diversity of ERVs, with some 507 sequences never previously recorded in mammalian genomes, confirming interactions between 508 bats and complex retroviruses, which endogenize exceptionally rarely. These integrations are 509 indicative of past viral infections, highlighting which viruses bat species have co-evolved with 510 and tolerated, and thus, can help us better predict potential zoonotic spillover events and direct 511 routine viral monitoring in key species and populations. In addition, bats, as one of the largest 512 orders of mammals, are an excellent model to observe how co-evolution with viruses can shape 513 the mammalian genome over evolutionary timescales. For example, the expansion of the 514 APOBEC3 genes in bats shown herein, could be a result of a co-evolutionary arms race shaped 515 by ancient retroviral invasions, and could contribute to the restriction in copy number of 516 endogenous viruses in some bat species. Given that these findings were generated from only 517 six bat genomes we can be confident that further cross-species comparison with similar quality 518 bat genomes will bring even greater insight. 519 520 Changes in Non-Coding RNAs 521 In addition to coding genes, changes in non-coding (nc)RNAs can be associated with 522 interspecific phenotypic variation and can drive adaptation 99,100 . We used our reference-quality 523 genomes to comprehensively annotate non-coding RNAs and search for ncRNA changes 524 between bat species and other mammals. To annotate different classes of conserved non-coding 525 RNA genes, we used computational methods that capture characteristic sequence and structure 526 features of ncRNAs ( Fig. 5a; supplementary methods section 5.1). We found that a large 527 proportion of non-coding RNA genes were shared across all six bats (Fig. S15), and between 528 bats and other mammals (e.g. 95.8% ~ 97.4% shared between bats and human). 529 530 Within ncRNAs, we next investigated microRNAs (miRNA), which can serve as 531 developmental and evolutionary drivers of change 101 . We employed a strict pipeline to annotate 532 known miRNAs in our six bat genomes and in the 42 outgroup mammal taxa (Table S19,  533 supplementary methods section 5.1) and investigated how the size of miRNA families evolved 534 using CAFÉ 102 . We identified 286 miRNA families present in at least one mammal and 535 observed massive contractions of these miRNA families (Fig. S16) with an estimated overall 536 rate of 'death' 1.43 times faster than the rate of 'birth' (see supplementary methods section 537 5.1). There were 19 families that significantly (FDR<0.05) contracted in the ancestral bat 538 branch, with no evidence of expansions, and between 4 and 35 miRNA families were 539 contracted across bats (Fig. 5b, Fig. S16). We also inferred the miRNA families lost in each 540 bat lineage using a Dollo parsimony approach, which revealed 16 miRNA families that were 541 lost in the bat ancestor ( Fig. S17 and S18). Interestingly, the oncogenic miR-374 was lost in all 542 bat species but was found in the other examined orders (Table S19). Since miR-374 promotes 543 tumour progression and metastasis in diverse human cancers 103 , this bat specific loss may 544 contribute to low cancer rates in bats 16 . 545 546 Next, we investigated the evolution of single-copy miRNA genes to determine if sequence 547 variation in these miRNAs may be driving biological change. Alignments of 98 highly 548 conserved, single-copy miRNA genes identified across all 48 mammal genomes revealed that 549 one miRNA, miR-337-3p, had unique variation in the seed region in bats compared to all other 550 42 mammals (Fig. 5c). miR-337-3p was pervasively expressed in brain, liver, and kidney across 551 all six bat species (Fig. S19). Given that the seed sequences of microRNAs represent the 552 strongest determinant of target specificity, these changes are expected to alter the repertoire of 553 sequences targeted by miR-337-3p in bats. 554

555
To test this hypothesis, we used reporter assays 104,105 to determine if the bat and human versions 556 of miR-337-3p were functionally active and if they showed species-specific regulation of an 557 "ideal" predicted target sequence (Table S20). While bat miR-337-3p strongly repressed the 558 expression of its cognate bat target sequence, it had no effect on the human site, and vice versa 559 (Fig. 5d). This result demonstrated that the miR-337-3p seed changes found in bats alter its 560 binding specificity. To explore whether this difference in binding specificity changes the set of 561 target genes regulated by bat miR-337-3p, we used our raw Iso-seq data to identify 3'UTRs of 562 coding genes in bats (n=6,891-16,115) and determined possible target genes of miR-337-3p 563 using a custom in silico pipeline (Table S21; supplementary methods section 5.3.5). We also 564 obtained the equivalent human 3'UTRs for all predicted bat 3'UTRs and identified the human 565 miR-337-3p gene targets (supplementary methods section 5.3.5). In bats, miR-337-3p was 566 predicted to regulate a distinct spectrum of gene targets compared to humans (Table S22). GO 567 enrichment analysis of these target gene sets suggests a shift towards regulation of 568 developmental, rhythmic, synaptic and behavioural gene pathways by miR-337-3p in bats (Fig.  569 5e), pointing to a dramatic change in processes regulated by miR-337-3p in bats compared to 570 other mammals. 571 572 In addition to losses and changes in miRNAs, continuous miRNA innovation is observed in 573 eukaryotes, which is suggested as a key player in the emergence of increasing organismal 574 complexity 99 . To identify any novel miRNAs that evolved in bats, we performed deep 575 sequencing of small RNA libraries from brain, liver and kidney for all six bats (Table S23), 576 analysed these data using a comprehensive custom analysis pipeline (see supplementary 577 methods section 5.3.3), and identified those miRNAs that possess seed regions not found in 578 miRBase (release 22). This screen revealed between 122 and 261 novel miRNAs across the six 579 bat genomes. Only a small number of these novel miRNAs were shared across the six bats, 580 supporting rapid birth of miRNAs on bat lineages (Fig. S20). We identified 12 novel miRNAs 581 that were found in all six bats but did not have apparent homologs in other mammals (Table  582 S24). Prediction of miRNAs from genomic sequences alone may result in false positives due 583 to the occurrence of short hairpin-forming sequences that are predicted to form hairpins but are 584 not processed or functionally active, emphasizing the need for experimental testing of these 585 miRNAs. Therefore, to test whether these candidates indeed function as miRNAs we selected 586 the top 3 candidates (bat-miR-4665, bat-miR-19125, bat-miR-6665) (Table S24) based on their 587 expression and secondary structures, and experimentally tested their ability to regulate an ideal 588 target sequence in reporter assays, as above (Table S20). Two of the three miRNAs (miR-589 19125 and miR-6665) were able to regulate their targets, showing that they are actively 590 processed by the endogenous miRNA machinery, and able to be loaded onto the RISC complex 591 to repress target mRNAs (Fig. 5f). Thus, miR-19125 and miR-6665 represent true miRNAs 592 that are novel to bats. Taken together, these data demonstrate innovation in the bat lineage with 593 regard to miRNAs both in seed sequence variation as well as novel miRNA emergence. 594

595
In summary, our genomic screens and experiments revealed losses of ancestral miRNAs, gains 596 of novel functional miRNA and a striking case of miRNA seed change that alters the target 597 specificity. Changes in these miRNAs and their target genes point to a regulatory role in cancer, 598 development and behaviour in bats. Further detailed mechanistic studies will be crucial to 599 determine the role of these miRNAs in bat physiology and evolution. 600

601
Discussion 602 We have used a combination of state-of-the-art methods including long-read, short-read, and 603 scaffolding technologies to generate chromosome level, near-complete assemblies of six bats 604 that represent diversity within Chiroptera. These reference-quality genomes improve on all 605 published bat genomes and are on par with the best reference-quality genomes currently 606 generated for any eukaryotic species with a complex, multi-gigabyte genome. Compared to the 607 contiguity and completeness of previous bat genomes assembled with short reads, our 608 reference-quality genomes offer significant advances. First, while fragmented and incomplete 609 assemblies hamper gene annotation, reference-quality genomes allow comprehensive 610 annotations by integrating a variety of methods and evidence. In particular, reference-quality 611 genomes facilitate genome alignment, which provides a powerful way of transferring gene 612 annotations of related species to new assemblies and ensures that transcriptomic data can be 613 comprehensively mapped. Second, while fragmented and incomplete assemblies resulted in 614 countless efforts by individual labs to laboriously clone and re-sequence genomic loci 615 containing genes of interest, such efforts are not necessary with comprehensively annotated, 616 reference-quality assemblies. Third, reference-quality assemblies are a resource for studying 617 gene regulation by non-coding RNAs and cis-regulatory elements. The high completeness 618 enables a comprehensive mapping of functional genomics data such as miRNA read data and 619 epigenomic data (e.g. ChIP-seq, ATAC-Seq), and the high contiguity is crucial for assigning 620 regulatory regions to putative target genes and linking genotype to phenotype. 621

622
The six reference-quality assemblies coupled with methodological advances enabled us to 623 address the long-standing question of the phylogenetic position of bats within Laurasiatheria. 624 We used our comprehensive gene annotations to obtain the largest set of orthologous genes 625 and homologous regions to date, which enabled us to explore the phylogenetic signal across 626 different genomic partitions. Consistently, a variety of phylogenetic methods and data sets 627 estimate that bats are a sister clade to Fereuungulata and highlight the importance of 628 maximising the genetic coverage and ensuring that the appropriate models and data are used 629 when reconstructing difficult nodes. 630 631 Our comprehensive and conservative genome-wide screens investigating gene gain, loss and 632 selection provide candidates that are likely related to the unique immunity of bats. Furthermore, 633 our screens reveal selection in hearing genes in stem Chiroptera, which is consistent with the 634 hypothesis that echolocation evolved once in bats and was secondarily lost in Pteropodidae, 635 but inconsistent with the alternative hypothesis that echolocation evolved twice independently 636 within bats. As such, our analysis provides molecular evidence informing a long-standing 637 question of when echolocation evolved. We further show that bats have a long coevolutionary 638 history with viruses and identified unique mammalian viral integrations. Finally, we explored 639 the non-coding genome in bats, where we found miRNAs that were novel to bats, lost in bats, 640 or carried bat specific changes in their seed sequence. These important regulators of gene 641 expression point to ancestral changes in the bat genome that may have contributed to 642 adaptations related to the low incidence of cancer in bats, as well as developmental and 643 behavioural processes.   0  50  100  0  100  150  0  50  100  150  200  200  300