A model species for agricultural pest genomics: the genome of the Colorado potato beetle, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae)

The Colorado potato beetle is one of the most challenging agricultural pests to manage. It has shown a spectacular ability to adapt to a variety of solanaceaeous plants and variable climates during its global invasion, and, notably, to rapidly evolve insecticide resistance. To examine evidence of rapid evolutionary change, and to understand the genetic basis of herbivory and insecticide resistance, we tested for structural and functional genomic changes relative to other arthropod species using genome sequencing, transcriptomics, and community annotation. Two factors that might facilitate rapid evolutionary change include transposable elements, which comprise at least 17% of the genome and are rapidly evolving compared to other Coleoptera, and high levels of nucleotide diversity in rapidly growing pest populations. Adaptations to plant feeding are evident in gene expansions and differential expression of digestive enzymes in gut tissues, as well as expansions of gustatory receptors for bitter tasting. Surprisingly, the suite of genes involved in insecticide resistance is similar to other beetles. Finally, duplications in the RNAi pathway might explain why Leptinotarsa decemlineata has high sensitivity to dsRNA. The L. decemlineata genome provides opportunities to investigate a broad range of phenotypes and to develop sustainable methods to control this widely successful pest.

. Clusters of genes encoding cuticle proteins in the genome of Leptinotarsa decemlineata. 321 Table 23S. Genes associated with RNA interference in Leptinotarsa decemlineata.  Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. USDA is an equal opportunity provider and employer.
incomplete. Assuming direct concatenation of these four scaffolds, the Hox cluster would span a region of 3.7 Mb, similar to the estimated 3.5 Mb Hox cluster of A. glabripennis [2]. However, as the L. decemlineata genome is only about half the size, this reflects disproportionately large intergenic distances, as individual gene loci are not larger (see below). This may be due to possible local misassemblies with wrongly included sequence on these scaffolds. In the much smaller genome of T. castaneum, the Hox cluster's relative size is comparable to that seen in A.
For the small Iroquois Complex, while L. decemlineata has single, clear orthologs of both iroquois (iro) and mirror (mirr), here they are not linked in the current assembly (Supplementary Figure 1SB). Direct end-to-end concatenation of the relevant scaffolds would give a cluster size of 258 Kb, which at 74% of the A. glabripennis Iro-C cluster size is rather more proportional to genome size. As would be expected for an Iro-C cluster, the concatenated reconstruction in L. decemlineata includes no predicted genes between iro and mirr, and the two genes would have the same transcriptional orientation.
Leptinotarsa decemlineata Hox gene loci tend to be the same size or smaller than in A. glabripennis, encoding proteins of comparable size and on average only 10% longer than in T.
castaneum. In all three beetle species, Hox protein sequence similarities were over 70%, and intron splice sites were largely conserved. Minor exceptions to these trends are the functionally diverged Hox genes zerknüllt (zen) and fushi tarazu (ftz), which might be expected to exhibit higher levels of sequence divergence [26][27][28]. These genes' loci are 10 and 6-fold larger, respectively, than in Anoplophora. Additionally, the protein sequence conservation across all three beetles was around 50% for ftz, and under 40% for zen.  Figure 3SB), and the lack of an ortholog in A. glabripennis (syntenic or otherwise) casts doubt on whether this is an accurate gene model.

Non-coding RNA Annotation
We identified 85 putative miRNA loci representing 61 miRNA families in L. decemlineata (Supplementary Dataset 2). One Coleoptera-specific miRNA, miR-3849, was found. This miRNA was shared by both T. castaneum and L. decemlineata, but is not known from other insect lineages. Interestingly, miR-3849 was located in the first intron of the wingless gene in both species, suggesting that this miRNA might have an important role in the development of coleopteran insects.

Odorant Binding Proteins
Odorant binding proteins (OBP) are small soluble proteins that are highly abundant in the insect sensillar lymph. They are believed to solubilize hydrophobic odor molecules and transport them to the receptors that are housed in the membrane of olfactory sensory neuron dendrites [29][30][31]. OBPs have in some cases also been shown to influence odor responses, thus they might act as a pre-receptor filter, shaping the response specificity of olfactory sensory neurons (e.g. In total, 59 OBP genes were identified in the L. decemlineata genome assembly ( [34]. All of the OBPs reported in this study could be re-identified in the genome, with overall consistent protein sequence prediction (see Supplementary Table 13S for discrepancies).
Numbers to our OBP genes were given based on their positioning on genomic scaffolds and groupings in the tree (Supplementary Figure 5S). While our numbers do not correspond to the numbers in Liu et al.
[34], we reference their classification along with genomic location and additional details of the OBP genes (Supplementary Table 13S).
Of the 59 L. decemlineata OBP gene models, 57 encode full-length proteins and 17 of these were completed using available genomic and transcriptomic raw reads due to the Nterminus, C-terminus and/or internal regions of the genes being located in assembly gaps (suffix "FIX" added to gene name). One model remains partial (OBP38CTE) because its C-terminus could not be recovered from the raw reads (suffix "CTE" for C-terminus missing). Another gene (OBP10PSE) was characterized as a pseudogene since it appears to lack a first coding exon.
Three of the genes (OBP44FJ, OBP55FJ, OBP56FJ) had exons assembled on multiple scaffolds, which for each gene could be merged into one model based on support from the raw reads (suffix "FJ" for FIX + JOIN). The automated gene prediction pipeline predicted only 16 of the 59 OBP genes correctly. Hence, 27 of the automated models had to be manually corrected, whereas 16 models had been missed entirely and thus were added manually. All of these gene models, where possible, are now annotated in the Apollo browser so that they will be available with the final protein release, but unassembled regions are not available therein.
A large proportion of OBP genes were present as tandemly duplicated arrays, with the largest tandem array comprising 10 OBP genes (LdecOBP45-54) over a 154 kb region of scaffold 160 (see Supplementary Table 13S for details). The majority of the OBP genes contained either one intron (23 genes) or five introns (26 genes). All genes with only one intron encoded the short signal peptide on the first exon, and the rest of the protein on the second exon.
The 59 LdecOBPs were aligned with all OBPs from A. glabripennis and T. castaneum as well as a selected subset of OBPs from D. melanogaster using MUSCLE followed by limited manual correction. Uninformative regions of the alignment were removed with trimAL with the independent origin of a Dimer OBP in L. decemlineata, since it is part of a large array on a single scaffold along with the Minus-C OBPs12-19, all of which are somewhat related to each other. In addition, LdecOBP17 is not related to the Dimer OBPs in D. melanogaster (which therefore were left out from the tree). LdecOBP17 had the four-cysteine motif of Minus-C OBPs repeated, and grouped in the species-diverse clade of Minus-C OBPs. The remaining 14 LdecOBPs (LdecOBP22-LdecOBP28 and LdecOBP30-LdecOBP36) were of the Classic type.
In conclusion, the size of the genomic OBP repertoire in L. decemlineata is similar (or even slightly larger) to the size in other insect species. In contrast, the OR, GR, and IR families all appear reduced in L. decemlineata in relation to other beetle species (Table 2). Assuming that the reduction of the other chemosensory gene families (i.e. the receptors) reflects the host specialization of L. decemlineata, this result suggests that the OBP family in L. decemlineata has evolved differently in terms of repertoire size change in response to ecological adaptation. This result might not be surprising since the evolution of these gene families can be highly dynamic [31,36]. The absence of a repertoire size reduction among the OBPs could be due to the multiple physiological roles of these proteins [30].

Odorant Receptors
Odorant receptors are an insect-specific radiation of receptors present in the membranes of odorant receptor neurons in the insect sensillum [33], and are believed to be the primary mechanism for detecting and transducing olfactory signals (but see ionotropic receptors, e.g., [37]). To date, the odorant receptors of insects exist as heteromers containing two members: an olfactory receptor co-receptor (Orco) and an odorant receptor protein (OR), with the latter presenting the specific binding site for odorants [33]. While Orco is highly conserved and present as a single copy in winged insects [38], the ORs of insects evolve rapidly and show little to no homology across insect orders. ORs initially appeared to cluster in seven major clades within the Coleoptera (Groups 1-7; [39,40]) but this organization is becoming less clear as more beetle receptors are described. Genes encoding putative odorant receptors were identified with iterating BLAST searches as described above, and models were built using Geneious v6.1.6 (Biomatters Ltd.). We also included the 36 ORs identified from Liu et al. [34] in our search; 25 were present in the genome, 10 proved to be redundant models, and one (OR28) could not be identified in the genome and showed no homology to known ORs. ORs have been renumbered in this manuscript to account for these discrepancies and to better organize the receptors by relatedness (see Supplemental Table 14S for correspondence of numbers).
We identified 75 OR genes in L. decemlineata, with four additional pseudogenes and a single Orco. Most OR genes included 4-6 introns, which often were quite large; complete genes frequently spanned over 30kb. Thus, many exons were situated in unsequenced regions of the genome and are missing, and only 30 full-length models were annotated (see Supplemental   Table 14S, protein sequences are included in the Supplementary Dataset 3). Three additional models (LdecOR23, 25, 54) were completed by comparison to data from Liu et al. [34]. The remaining models are missing N-terminal, internal, or C-terminal exons and are respectively denoted with NTE, INT, or CTE suffixes. When multiple suffixes were necessary, each was reduced to the single initial letters (e.g., NTE-INT becomes NI).
The complete set of LdecORs was aligned to ORs of A. glabripennis and T. castaneum, and analyzed as described above, but with untrimmed sequences. The phylogeny revealed that LdecORs could broadly be classified into the same broad groups defined by cerambycids and tenebrionids (Supplemental Figure 6S). We noted fewer and smaller radiations in L.
decemlineata relative to the other two beetle species, which resulted in fewer overall ORs (A. glabripennis: 121+11PSE, T. castaneum: 262+79PSE). However, as this is only the third genome-wide analysis of odorant receptors in the Coleoptera, the significance of this reduction is unclear.
Most LdecORs placed alongside radiations from the sister chrysomeloid A. glabripennis, with a notable exception of LdecOR52-55, which placed within a Group 3 radiation of T.
castaneum ORs. No homologs were found in L. decemlineata for Group 4, Group 6, or the isolated TcasOR71-72. In fact, the placement of LdecORs offers further support to the hypothesis that Groups 5 and 6 are expansions restricted to the Tenebrionoidea, and might be merged into a broader family that includes Group 4 and a novel chrysomeloid expansion. This clade would be defined by the conserved outgroup of AglaOR101-103 and TcasOR275 [40], and here, by LdecOR56. Some small clades of receptors that were identified from previous beetle genomes [39,40] were also present in L. decemlineata by way of LdecOR45 ("Group 3A") and LdecOR77-79 (an as-yet unnamed clade identified in A. glabripennis). No ORs have been functionally characterized from beetles save for three pheromone receptors in the cerambycid Megacyllene caryae [41], so the significance of these relationships remains unknown.

Gustatory Receptors
Insect gustatory receptors (GRs) are seven transmembrane proteins which form ligandactivated ion channels [42]. They have a major role in gustation particularly for sweet and bitter taste and additional roles as receptors for CO2, pheromones, DEET, some amino acids and fatty acids and also thermal receptors (for a review, see [43]). A similar method used to identify T.
castaneum Gr genes [6] was employed, except that the GR sequences from T. castaneum, A.
glabripennis and D. melanogaster were used for the tBlastn search using Legacy BLAST available at the i5k Workspace@NAL website and the annotation was done on Apollo.
We identified 90 candidate Gr genes and three additional pseudogenes in the L.
decemlineata genome ( Table 2,  Six candidate sugar receptor genes were identified (LdecGr4-9), which are fewer than those of T.
castaneum (16 genes; and A. glabripennis (10 genes, AglaGr4-13). Roles of DmelGR5a, 64a, and 64f in sensing sugar (e.g. sucrose, maltose and trehalose) have been demonstrated [47][48][49]. This clade is one of the oldest branches of chemoreceptor genes as their orthologs have been identified in all insects, and can be related to a crustacean ancestor [50,51], suggesting the important role for the perception of carbohydrate food sources.
We identified only one candidate fructose receptor (LdecGr10NI) which has a single ortholog in D. melanogaster (DmelGr43a) and B. mori (BmorGr9) [42,52]. However, this clade is expanded in T. castaneum (10 genes, TcasGr20-28 and TcasGr183) and A. glabripennis (3 genes, AglaGr14-16) which may suggest a higher requirement for sensing fructose in the two beetles although we could not exclude the possibility that additional genes were not discovered due to gaps in L. decemlineata genome assembly.
Other highly diverse GRs are generally classified as bitter receptors [44,53]. More than putative proteins via alternative splicing, respectively. This presumably reflects gustatory adaptation for different ecological niches, e.g., feeding behavior on the Solanum plants of L.

decemlineata.
Another interesting aspect for the beetle Gr genes is that some loci encode multiple proteins by alternative splicing. For example, LdecGr48 locus has 13 potential long N-terminal exons, all alternatively-spliced into a shared C-terminus exon and, in T. castaneum, TcasGr124 locus potentially encode 24 intact proteins [6]. None of T. castaneum and L. decemlineata odorant receptor genes are alternatively spliced [39], which could partly be explained by the strict gene regulation control of insect Or genes (only one or few Or genes per neuron) [57][58][59].
Multiple gustatory receptors in the same neuron may contribute to the discrimination of tastants with similar structures.

Ionotropic Receptors
The IR family of chemoreceptors is a variant lineage of the ancient ionotropic glutamate recepotors involved in both olfaction and gustation [60]. It consists of 27 genes (Table 16S and decemlineata there is a single ortholog of Ir41a, which is involved in sensing polyamines in D.
melanogaster [65], whereas the Ir75 clade has siz genes involved in sensing various acids in D.
melanogaster [66,67]. Finally, there is a set of highly divergent genes, only one of which has a clear relationship with the D. melanogaster IRs, Ir100a, a receptor of unknown function. Most are intronless, and those that have introns have idiosyncratically located introns in different phases, so likely were gained relatively recently (Table 16S). This pattern of mostly intronless genes is typical for these divergent IRs in other insects.
Following an approach initiated with the termite Zootermopsis nevadensis [51], and applied to several other insects including A. glabripennis and T. castaneum [2], the conserved IRs are named for their D. melanogaster orthologs, the Ir41 and 75 lineages are named with suffices a and a-f, while the divergent genes, except Ir100a, are numbered from 101 to avoid any confusion with the D. melanogaster IR names which only go to 100a. As discussed briefly in McKenna et al.
[2], the T. castaneum IRs are far more numerous than reported by Croset et al.
[61], with 53 divergent IRs added to their gene set for a new total of 80 genes, eight of which are clearly pseudogenes, and the complete set of these proteins with this modified naming system is included along with the L. decmlineata IR proteins (provided in Supplementary Dataset 3).
The conserved L. decmlineata IR genes were particularly difficult to model. The fractured nature of the assembly, combined with the fact that they are generally large genes spanning up to 100kb and have many short exons, caused problems ranging from missing or truncated exons in gaps, to genes split across scaffolds, including two instances of exons on different scaffolds being interdigitated with one another. By focusing on conserved sequence and RNAseq support, fairly complete models were constructed for most IR genes (provided in Supplementary Dataset 3). However, many of these IR genes cannot be properly modeled in Apollo and hence are only partially available at the i5k WorkSpace@NAL. Thus, four of the 14 conserved genes required repair of the assembly, commonly using raw RNAseq reads (the RNAseq reads mapped in Apollo are not useful when an exon is missing or a gene is split across scaffolds or otherwise misassembled), as well as raw genome reads (Table 16S). Two genes still have internal exons missing, while the N-terminus remains unclear for three genes. Official models existed for at least parts of most of the conserved genes, but the divergent genes, despite commonly being intronless, were usually only represented in the AUGUSTUS gene set. Twelve entirely new gene models were created (Table 16S). glabripennis, respectively ( Figure 8S). Furthermore, this is the only tandemly-duplicated pair of genes in this family (Table 16S). In addition, it appears that several divergent IR lineages have been lost from L. decemlineata. These divergent IRs in the beetles are related to two clades of IRs in D. melanogaster, the Ir7/11a clade of genes expressed in larval and adult gustatory organs [61] and the much larger Ir20a clade also implicated in gustation rather than olfaction [60,68,69], but branch support levels are low within this set of divergent IRs. Ligands are not known for any of these divergent IRs in D. melanogaster, but this considerable potential gustatory receptor contraction in L. decemlineata is in line with the contraction of the GR family, and is presumably related to the host specialization of this beetle.

Figure 9S
. Phylogenetic tree of cys-loop ligand-gated ion channels in Leptinotarsa decemlineata. The tree was constructed using the Neighbor-joining method in MEGA6, using Poisson correction and 1,000 replicates bootstrap replicates. Figure 10S. Representation of the exon structure of GABAR1 and the alternative versions of exon 3 and 6. Figure 11S. Phylogenetic tree of cuticle proteins from Leptinotarsa decemlineata. The tree was constructed using the Neighbor-joining method in MEGA6, using Poisson correction and 1,000 replicates bootstrap replicates. Figure 12S. Phylogenetic tree of cytosolic GST proteins from Leptinotarsa decemlineata. The tree was constructed using the Neighbor-joining method in MEGA6, using Poisson correction and 1,000 replicates bootstrap replicates.