Chemosensory adaptations of the mountain fly Drosophila nigrosparsa (Insecta: Diptera) through genomics’ and structural biology’s lenses

Chemoreception is essential for survival. Some chemicals signal the presence of nutrients or toxins, others the proximity of mating partners, competitors, or predators. Chemical signal transduction has therefore been studied in multiple organisms. In Drosophila species, a number of odorant receptor genes and various other types of chemoreceptors were found. Three main gene families encode for membrane receptors and one for globular proteins that shuttle compounds with different degrees of affinity and specificity towards receptors. By sequencing the genome of Drosophila nigrosparsa, a habitat specialist restricted to montane/alpine environment, and combining genomics and structural biology techniques, we characterised odorant, gustatory, ionotropic receptors and odorant binding proteins, annotating 189 loci and modelling the protein structure of two ionotropic receptors and one odorant binding protein. We hypothesise that the D. nigrosparsa genome experienced gene loss and various evolutionary pressures (diversifying positive selection, relaxation, and pseudogenisation), as well as structural modification in the geometry and electrostatic potential of the two ionotropic receptor binding sites. We discuss possible trajectories in chemosensory adaptation processes, possibly enhancing compound affinity and mediating the evolution of more specialized food, and a fine-tuned mechanism of adaptation.

Based on gapped regions flanking or overlapping exons, we distinguished between partial and putative full-length loci. Of the 55 loci, 42 (76%) were recovered with a complete gene model. Putative full-length D. nigrosparsa ORs were quite divergent, sharing between 44% (DnigOr85a) and 97% (DnigOr30a) of protein identity with their closest related amino acid sequence (average 70%).
Greater sequence identity was observed in S. flava and D. virilis than in the other species. With these two, protein domain predictors found the 7tm_6 PFAM domain in all loci, and a more detailed analysis on the putative full-length genes resulted in predicting from one to seven TMHs ( Figure S2a). Thirty-three loci were assigned as functional, while the remaining nine were assigned as putative pseudogenes. The proportion of amino acid residue aligned to the PFAM protein domain varied between 82% and 100%, with a mean of 96%. Taking into account only full-length loci, we found that receptors in clade V, encompassing Or67d to Or85c (Figure 1), had a distribution of predicted TMHs with a bias towards 6-TMHs instead of 7-TMHs. Namely, 89% of sequences were predicted with 6-TMHs, compared with 17% in the other clades. Looking at the domain identities of this cluster, we observed a significant lower value in clade V (average identity of 22%) than in all other clades (average identity of 25%; p<0.002, Wilcoxon rank-sum test; Figure S2b).

Identification of candidate gustatory receptors.
Forty-seven candidate loci for the GR gene family were annotated. Based on read splicing and gene structure, three of these putatively transcribe into different splice variants: two loci with two isoforms, and one with four (Table S2) Looking at the full-length protein sequence diversity, sequence identities among D. nigrosparsa GRs and other Drosophila subgenera species have a slightly wider distribution, varying between 53% (DnigGr39aA) and 97% (DnigGr29bB), with an average identity of 76%. The highest sequence identity value was observed with D. virilis at 21 best hits, followed by D. grimshawi at 11 best hits.

Identification of candidate ionotropic receptors.
We annotated both subfamilies and found a total of 54 loci. The phylogenetic family tree grouped all genes into 67 clusters ( Figure 3). All 14 members of the three iGluR subfamilies (ten Kainate, two AMPA, and two NMDA receptors) 7  nigrosparsa loci clustered as single copy genes, only for three genes two inparalogs per cluster were found (CG11155, Ir75b, Ir87a; Figure 3). Of the 39 loci, 26 (67%) were recovered with a complete gene model, of which only one was assigned as putative pseudogene (DnigIr94d). We observed a significantly decreasing sequence similarity from iGluRs (median identity of the best hit: 91%) to antennal (81%) to divergent IRs (71%) (p-values < 0.001, Wilcoxon rank-sum test).

A different degree of divergence among subfamilies was observed for
Lig_chan-Glu_bd and Lig_chan domains, which shared a sequence identity of 40% and 49% in their iGluR, respectively. In contrast, between the other two IR subfamilies antennal and divergent divergence increased significantly. In detail, similarity of the Lig_chan-Glu_bd domain decreased from 24% in antennal to 18% in divergent and similarity of the Lig_chan domain decreased from 26% in antennal to 16% in divergent (p-adjusted < 0.05, Wilcoxon rank-sum test; Figure   S2d).

Identification of candidate odorant binding proteins. Thirty-two candidate
OBP loci were identified and aligned to compute the phylogenetic family tree, which grouped all genes into 57 clusters. Drosophila nigrosparsa OBP loci clustered in 31 of them, with two in-paralogs in the Obp58b cluster. Twenty-four of these clusters (77%) were recovered as monophyletic with significant node support (bs ≥ 75), one as paraphyletic although not supported (Obp19d), and six as monophyletic without statistical support (Figure 4). There were 44 gene clusters present in the Drosophila subgenus species, ranging between 41 and 42 in the different species; and between ten and 11 loci were missing in our annotation ( Figure 4).
Putative full-length OBPs were recovered for 30 of the 32 annotated loci.
For each selected non-synonymous substitution in all loci under positive selection, its sequence position and flanking region were locally aligned to show the conservation level.