Comparative genomics sheds light on the predatory lifestyle of accipitrids and owls

Raptors are carnivorous birds including accipitrids (Accipitridae, Accipitriformes) and owls (Strigiformes), which are diurnal and nocturnal, respectively. To examine the evolutionary basis of adaptations to different light cycles and hunting behavior between accipitrids and owls, we de novo assembled besra (Accipiter virgatus, Accipitridae, Accipitriformes) and oriental scops owl (Otus sunia, Strigidae, Strigiformes) draft genomes. Comparative genomics demonstrated four PSGs (positively selected genes) (XRCC5, PRIMPOL, MDM2, and SIRT1) related to the response to ultraviolet (UV) radiation in accipitrids, and one PSG (ALCAM) associated with retina development in owls, which was consistent with their respective diurnal/nocturnal predatory lifestyles. We identified five accipitrid-specific and two owl-specific missense mutations and most of which were predicted to affect the protein function by PolyPhen-2. Genome comparison showed the diversification of raptor olfactory receptor repertoires, which may reflect an important role of olfaction in their predatory lifestyle. Comparison of TAS2R gene (i.e. linked to tasting bitterness) number in birds with different dietary lifestyles suggested that dietary toxins were a major selective force shaping the diversity of TAS2R repertoires. Fewer TAS2R genes in raptors reflected their carnivorous diet, since animal tissues are less likely to contain toxins than plant material. Our data and findings provide valuable genomic resources for studying the genetic mechanisms of raptors’ environmental adaptation, particularly olfaction, nocturnality and response to UV radiation.

However, recent studies have reported that some raptors rely more on olfactory cues than visual acuity 13,14 . A previous approach to molecular characterization of the avian olfactory system using PCR (Polymerase Chain Reaction) amplification of olfactory receptor genes (ORs) with degenerate primers 15 generated an approximate measurement of OR quantity. The quantity of ORs tended to be over-estimated through PCR with degenerate primers and only incomplete fragments were produced 16 . By comparison, de novo assembled genomes contributed to a global OR repertoire assessment 17 and can therefore be employed to obtain a more accurate OR repertoire estimation. Raptors also rely on other senses to successfully hunt and consume prey, and to avoid unpalatable and toxic food. For example, the detection of bitter tasting food is reported to protect animals from ingesting poisonous substances 18,19 . TAS2R genes have been reported as being involved in the animal's ability to detect bitter and thus potentially poisonous food [20][21][22] , occurring more frequently in herbivores than in carnivores 23 . Herbivores have undergone a stronger selective pressure to retain TAS2R genes because plant tissues are more likely to contain toxins than animal tissues 24 . With an increasing number of raptor genomes, it is meaningful to compare TAS2R genes in birds with different dietary lifestyles to determine whether detection of bitterness is important to raptors. Thus, analyses of TAS2R variation in select bird species can provide valuable information on the use of taste in hunting raptors, in addition to visual and olfactory cues.
We consequently assembled draft whole-genome sequences of Accipiter virgatus (besra) and Otus sunia (oriental scops owl) to better understand the evolutionary adaptations related to the predatory lifestyle in diurnal accipitrids and nocturnal owls. We aimed to provide an insight into genomic changes affecting physiological functions (response to UV radiation and DNA damage repair) in diurnal accipitrids and retina development in owls. We annotated ORs in 13 bird species to demonstrate the diversification of raptors' ORs repertoire. Furthermore, we aimed to compare the TAS2R gene number in bird species with varied diets. This work will provide novel insights into the evolutionary history and the genetic basis of adaptations to hunting of diurnal accipitrids and nocturnal owls, and a solid foundation for future raptor genetic and epigenomic studies.

Results
Genome sequencing, assembly and quality assessment. Muscle  Bird phylogeny, divergence and evolution of gene families. We identified 16,530 gene families for 13 bird species (four accipitrids: (besra (A. virgatus), bald eagle (Haliaeetus leucocephalus), white-talied eagle (Haliaeetus albicilla), golden eagle (Aquila chrysaetos)), three owls (Oriental scops owl (O. sunia), barn owl (Tyto alba), northern spotted owl (Strix occidentalis)), two falcons (peregrine falcon (Falco peregrinus), saker falcon (Falco cherrug)), zebra finch (T. guttata), red junglefowl (G. gallus), chuck-will's-widow (Antrostomus carolinensis), and brown kiwi (Apteryx australis)) (Supplementary Fig. S8 and Table S16), of which 2,845 represented 1:1 orthologous gene families. Comparison of orthologous gene clusters among accipitrids, falcons, owls, T. guttata, and G. gallus is showed in Fig. 1b. The maximum likelihood phylogeny constructed based on the 1:1 orthologous genes indicated that owls and accipitrids belong to a subclade that was most likely derived from a common ancestor approximately 42.9 million years ago (Mya) (Fig. 1a) Table S17). A top gene in our selection scan was XRCC5, which was also called ku80. XRCC5 is important for the repair of DNA ends by non-homologous end joining (NHEJ) (Fig. 2a). Deletion of XRCC5 in mice has resulted in immune deficiency, growth retardation, increased chromosomal instability, and cancer 25,26 . Altered expression of XRCC5 has caused oncogenic phenotypes, such as genomic instability, hyper proliferation and resistance to apoptosis, and tumorigenesis 27 , and has been detected in various types of sporadic cancer 28 . We found five missense mutations in XRCC5 in accipitrids (Fig. 2b), of which three missense mutations were predicted to be deleterious by PolyPhen-2 29 (Table 1). Further validation showed that all accipitrids had the same amino acid type as A. virgatus, H. leucocephalus, H. albicilla and A. chrysaetos at all mutation sites (Fig. 2b). Therefore, it was very clear that the mutations at XRCC5 were accipitrid-specific.
In response to ionizing radiation, SIRT1 was reported to play a vital part in DNA repair 30,31 . SIRT1 null MEF cells were observed to be more sensitive than control MEF cells in response to UV irradiation, indicating that SIRT1 was possibly involved in UV-induced DNA repair 32 . Murine double minute 2 (MDM2) was an important component of the response to UV radiation, and cells with decreased levels of MDM2 showed more sensitivity to ionizing radiation 33 . PRIMPOL played a critical role in damage tolerance to UV irradiation during DNA replication 34,35 . Decreased levels of PRIMPOL sensitized mammalian and avian cells to UV irradiation [36][37][38] , suggesting that PRIMPOL was vital for recovery from UV damage.  Table S17). It has been documented that ALCAM can guide retinal axons in Drosophila and rodents 39,40 , and ALCAM has a potential role in human retinal angiogenesis 41 . Angiogenesis is a crucial mechanism in ischemic retinal vasculopathy pathogenesis 42 , and previous studies have demonstrated that ALCAM might participate in both processes 41 . The determination of ALCAM's involvement in these processes is important as ischemic retinal vasculopathies and posterior uveitis have led to vision loss in people in the United States and worldwide 43,44 . Two owl-specific missense mutations were identified in ALCAM (Fig. 3d), and both missense mutations were predicted to affect the protein function by PolyPhen-2 (Table 1). Further validation,  including PCR confirmed that all the owls had the same amino acid type as O. sunia, T. alba and S. occidentalis at both mutation sites, which demonstrated that both mutations at ALCAM were owl-specific. Both missense mutations had a deleterious influence on protein structure (Fig. 3a-c).
olfactory Receptor Genes (oRs). We annotated the ORs in 13 bird species based on putative functionality and seven transmembrane helices (7TM) ( Supplementary Table S18). Generally, the number of ORs in raptors varied between orders: owls had the most, accipitrids medium, and falcons least. Overall, the average number of ORs in raptors was not less than in other bird species. Comparative analyses of the OR repertoire showed that most ORs in avian genomes were the γ subgroup of type 1 OR genes, in accordance with previously sequenced avian genomes 15 . Phylogenetic comparison of OR repertoires suggested that γ ORs in birds did not show an obvious species-specific clustering pattern, which was different from previous studies 45 . The gene family size of ORs in this study was very likely to have directly caused this contrast. Finally, a few γ ORs of the kiwi were basal to many clades containing γ ORs (Fig. 1c).
Dietary Lifestyle. The total number of TAS2R genes of accipitrids, falcons, and owls varied little (Fig. 4a).
There were no pseudogenes in the TAS2R repertoires in all raptors we studied, and the percentage of pseudogenes in birds was much lower than other vertebrates reported 23 . Compared to carnivorous birds, omnivorous birds tended to possess more TAS2R genes, while herbivorous birds possessed the most. It was reported that some TAS2R lineages were enriched because of species-specific gene duplications, and other lineages were relatively duplication free 46 . This phenomenon was also obvious in the phylogenetic tree we constructed (Fig. 4b). The lineages, which were marked with one color, demonstrated a cluster of genes belonging to the same species or group of closely related species. The lineages marked with many colors illustrated genes from distantly related species. There was positive correlation between the phylogenetically independent contrasts (PICs) in the dietary code and the PICs in the TAS2R gene repertoire size (R = 0.656, P = 6.706e-05) (Fig. 4c). Equally, the positive correlation was verified while discarding pseudogenes (R = 0.666, P = 4.859e-05) (Fig. 4c), and the difference between both correlations was not significant.

Discussion
The evolution of genes associated with the response to UV radiation and DNA damage repair may have helped diurnal raptors cope with flying or soaring at high altitudes. This type of adaptation has also been developed by other animals living at high altitude 47 . We identified four genes that are under positive selection in the accipitrid lineage, which were XRCC5, MDM2, PRIMPOL, and SIRT1, and conformed to this adaptation strategy. XRCC5 had five accipitrids-specific mutations that possibly affected the protein function. This finding may be related to the increased exposure of accipitrids to UV radiation compared to non-high-altitude birds. It is possible that potentially important reorganization of the radiation resistance system has taken place in the diurnal raptors since their divergence from other bird species, which may be related to the diurnal raptor lifestyle or ecology. There was a substantial evolutionary modification in the owl visual system 6 due to nocturnal adaptation. Owls possess large and rod-dominant retinas that are extremely sensitive to light, providing them with extraordinary night vision 6 . We found a PSG (ALCAM) associated with retina development, which also had two owls-specific mutations that possibly affected the protein function. This PSG possibly enhances low-light sensitivity, and thus resulted in high visual acuity at night. This gene and gene family expansions related to photoreceptor differentiation or development and retina development possibly worked together to enhance owl nocturnality.
It has long been hypothesized that raptors rely on eyesight for locating prey 12 . We found that the number of ORs in raptors was not less than other studied bird species, suggesting that raptors possibly have similar levels of olfactory sense. Our finding of the dual use of olfaction and vision in raptors is consistent with previous studies 13, 14 .
It has been reported that each TAS2R gene can discern different bitter compounds and different TAS2R genes have contrasting sensitivities to the same bitter compounds 48 . Previous studies assumed that TAS2R gains through gene duplication could raise the number of detectable toxins, while the losses of TAS2Rs would decrease this number 23 . Thus, the herbivores were likely to have more functional TAS2R genes than the carnivores. In this study, the carnivorous birds possessed the least TAS2R genes, while the herbivorous birds the most. Despite our findings being congruent with previous studies, the limited number of avian species and the qualitative dietary codes for species analyzed restricted the statistical power of our analyses, and therefore more species and quantitative dietary information should be employed in the future. Since TAS2R genes are capable of detecting different bitter compounds 48 , a variety of TAS2R genes should be included in future studies. In summary, this is the first report describing the complete Otus genome and Accipiter genome. Comparative genomics confirmed the PSGs associated with the response to UV radiation and DNA damage repair were present in diurnal raptors while not in nocturnal raptors and other bird species. Furthermore, the diversification of OR repertoires highlighted the importance of olfaction in the predatory lifestyle of raptors. We also confirmed that the genomic changes present in the owl genomes enhanced nocturnal vision. Analyses of TAS2R gene number in birds with different dietary lifestyles indicated that the TAS2R repertoire diversity was chiefly determined by feeding habits. Our de novo assembled genomes presented here will provide a resource for the future examination of evolution and adaptation of raptors to their environment and diet, and the genomes will eventually be the useful reference in aiding the long-term conservation of raptors and their genetic diversity, such as the investigation of the evolutionary adaptation and polymorphic microsatellite loci development.
Methods sampling and sequencing. Muscle samples of a wild male A. virgatus and a wild female O. sunia that both died of natural causes, were collected from Laojunshan National Nature Reserve (Yibin, Sichuan Province, China). Collected muscle samples were used for genomic DNA extraction, isolation and sequencing. We used a whole genome shotgun approach on the Illumina HiSeq. 2000 platform to sequence the genome. We constructed two paired-end libraries with insert sizes of 230 base pairs (bp) and 500 bp, and three mate-paired libraries with insert sizes of 2 kb, 5 kb and 10 kb.
Genome size estimation, genome assembly and completeness assessment. Before assembly, a 17-Kmer analysis was performed for estimating the genome size of A. virgatus and O. sunia genomes with 230 bp libraries, respectively. The assemblies were first performed by SOAPdenovo2 49 with the parameters set as "all -d 2 -M 2 -k 35". After using SSPACE 50 to build super-scaffolds, Intra-scaffold gaps were then filled using Gapcloser with reads from short-insert libraries. CEGMA 51 and BUSCO 52 were used to evaluate the genome completeness.
Gene prediction and annotation. We combined the de novo and homology-based prediction to identify PCGs in A. virgatus and O. sunia. The de novo prediction was performed on the assembled genomes with repetitive sequences masked as "N" based on the HMM (hidden Markovmodel) algorithm. AUGUSTUS 53 and GENSCAN 54 programs were executed to find coding genes, using appropriate parameters. For the homology prediction, proteins of G. gallus, F. cherrug, F. peregrinus, and humans were mapped onto both genomes using TblastN 55 with an E-value cutoff of 1E-5. To obtain the best matches of each alignment, the results yielded from TblastN were processed by SOLAR 56 . Homologous sequences were successively aligned against the matching gene models using GeneWise 57 . We used EVidenceModeler (EVM) 58 to integrate the above data and obtained a consensus gene set. We used Repeatmasker 59 to identify the repetitive sequences in the genomes of A. virgatus and O. sunia. tRNA genes were identified by tRNAscan-SE 60 . A. virgatus and O. sunia genomes were aligned against the rRNAs database to identify rRNA with blastN. We used INFERNAL 61 by searching against the Rfam database with default parameters to identify the other ncRNAs, including miRNA and snRNA.
Functional annotation. Functional annotation of the A. virgatus and O. sunia genes was undertaken based on the best match derived from the alignments to proteins annotated in SwissProt and TrEMBL databases 62 . Functional annotation used BLASTP tools with the same E-value cut-off of 1E-5. Descriptions of gene products from Gene Ontology (GO) ID were retrieved based on the results of SwissProt. We also annotated proteins against the NCBI non-redundant (Nr) protein database. The motifs and domains of genes were annotated using InterProScan 63 against publicly available databases, including ProDom 64 , PRINTS 65 , PIRSF 66 , Pfam 67 , ProSiteProfiles 68 , PANTHER 69 , SUPERFAMILY 70 , and SMART 71 . To find the best match and involved pathway for each gene, all genes were uploaded to KAAS 72 , a web server for functional annotation of genes against the manually corrected KEGG genes database by BLAST, using the bi-directional best hit (BBH) method.
Analyses of gene family, phylogeny, and divergence. We used orthoMCL 73 to define orthologous genes from 13 avian genomes (Supplementary Table S19). Phylogenetic analyses of these 13 birds were constructed using 1:1 orthologous genes. Coding sequences from each 1:1 orthologous family were aligned by PRANK 74 and concatenated to one sequence for each species for building the tree. Modeltest (ver. 3.7) was used to select the best substitution model 75 . RAxML 76 was then applied to reconstruct ML phylogenetic trees with 1,000 bootstrap replicates. Divergence time estimation was performed by PAML MCMCTREE 77 . positive selection analyses. The above alignments of 1:1 orthologous genes and phylogenetic tree were used to estimate the ratio of the rates of non-synonymous to synonymous substitutions (ω) per gene by ML with the codeml program within PAML 77 under the branch-site model. We then performed a likelihood ratio test and identified the PSGs of the accipitrid and owl branches, respectively.
Validation of the species-specific missense mutations with protein sequences obtained from Genbank. In order to validate the species-specific missense mutations in genes mentioned above, we downloaded all available protein sequences of birds, 6 mammals (human, macaque, mouse, cow, dog, and rabbit), and one reptile (alligator) of each gene from Genbank. All these protein sequences, together with protein sequences identified in genomes assembled in this study, were aligned using MEGA7 78 for each gene to validate the species-specific missense mutations. protein structure determination. The crystal structure of ALCAM was obtained from SWISS-MODEL 81 .
We converted the PDB files to PQR format with PDB2PQR server 82 . The PDB files were used for visualization of cartoon and surface representations of gene mutants. The visualization of the electrostatic surface potential was conducted using the APBS plugin in PyMOL 83 .

Analyses of olfactory receptor genes (oRs).
We constructed an OR database based on reference OR protein sequences downloaded from uniprot 84 . Thirteen studied avian genomes were aligned to the ORs database constructed above with TBLASTN. According to the results, we confirmed the intact olfactory receptor genes by a series of steps 85 . The OR gene repertoire estimated above and three non-ORs downloaded from Genbank were used for comparative phylogenetic analyses. The phylogenetic tree was constructed using the neighbor-joining method implemented in MEGA7. The reliability of the phylogenetic trees was evaluated with 1,000 bootstrap replicates.
Analyses of TAS2R genes. TAS2R database was constructed based on reference TAS2R protein sequences downloaded from uniprot 84 . The genomes of 10 omnivorous birds, 7 herbivorous birds, 7 studied raptors, and another 6 carnivorous birds were aligned to the TAS2R database constructed above with TBLASTN (Supplementary Table S22). We followed a previous study 23 in identifying TAS2R genes. A neighbor-joining tree of 81 protein sequences of intact TAS2Rs was constructed using MEGA7 with Poisson-corrected gamma distances. The reliability of the estimated tree was evaluated by the bootstrap method with 1,000 bootstrap replications. The package Analyses of Phylogenetics and Evolution 86 was used to perform the PIC analyses 87 , and the tree used in this analysis was built with 12 mitochondrial PCGs via RAxML with 1,000 bootstraps.

Data Availability
Genome and DNA sequencing data of besra and oriental scops owl have been deposited into the NCBI Sequence Read Archive (SRA) under the BioProject ID PRJNA420185. All other data supporting the findings of this study are available in the article and its Supplementary Material or are available from the authors upon request. More detailed information for the approaches has been revealed in the Supplementary Material.