A highly-contiguous genome assembly of the Eurasian spruce bark beetle, Ips typographus, provides insight into a major forest pest

Conifer-feeding bark beetles are important herbivores and decomposers in forest ecosystems. These species complete their life cycle in nutritionally poor substrates and some can kill enormous numbers of trees during population outbreaks. The Eurasian spruce bark beetle (Ips typographus) can destroy >100 million m3 of spruce in a single year. We report a 236.8 Mb I. typographus genome assembly using PacBio long-read sequencing. The final phased assembly has a contig N50 of 6.65 Mb in 272 contigs and is predicted to contain 23,923 protein-coding genes. We reveal expanded gene families associated with plant cell wall degradation, including pectinases, aspartyl proteases, and glycosyl hydrolases. This genome sequence from the genus Ips provides timely resources to address questions about the evolutionary biology of the true weevils (Curculionidae), one of the most species-rich animal families. In forests of today, increasingly stressed by global warming, this draft genome may assist in developing pest control strategies to mitigate outbreaks.

C onifer-feeding bark beetles (Coleoptera; Curculionidae; Scolytinae) are keystone species in forest ecosystems, contributing to wood decomposition and nutrient recycling through direct feeding and the action of beetle-associated microbiota 1 . However, a few so-called aggressive species can also kill large numbers of healthy trees through pheromonecoordinated mass-attacks once their populations surpass a critical threshold density, quickly transforming entire forest landscapes at tremendous economic and ecological costs 2,3 . Since abiotic factors are important drivers for beetle population growth 4 , outbreaks of aggressive species are expected to increase in frequency and severity due to climate change 5,6 . Beetle population growth is promoted by increasing temperatures that reduce developmental time, allowing for the production of additional generations per year. Furthermore, the resistance of conifer trees to beetle attack is compromised during heat and drought stress 7 .
The Eurasian spruce bark beetle (Ips typographus [L.]) is the most serious pest of Norway spruce (Picea abies [L.] Karst) and other spruce species in the beetle's range, currently causing unprecedented forest destruction across the Palearctic region (Fig. 1). In Europe, I. typographus killed between 2 and 14 million m 3 of spruce annually from 1970 to 2010 8 , but during the hot and dry summer of 2019, it killed more than 118 million m 3 9 . Beetle mass-attacks on trees are coordinated by an aggregation pheromone, which is produced by males as they initiate boring in suitable host trees and attracts large numbers of both sexes 10 . Pheromone attraction is modulated by other compounds produced by con-and heterospecific beetles as well as by host and non-host trees [11][12][13] . Due to the importance of volatile signals in the ecology of I. typographus, transcriptomes from antennal tissue have been generated to address the molecular biology and function of chemosensation [14][15][16] . However, genetic resources are otherwise scarce for this species, and such resources are needed to better understand the specialised ecology of I. typographus, and ultimately to develop improved pest management strategies.
Tree-killing bark beetles like I. typographus thrive on the inner bark ( Fig. 1a-d), one of the most nutritionally poor and recalcitrant organic substrates on earth 17 . Abundant carbon-rich bio-polymers present in bark and adjoining sapwood, such as lignin, hemicellulose and cellulose, may not be available to bark beetles as nutrients until prior degradation by fungi and other microbes 18 . Bark beetles, including I. typographus, are associated with a diverse microbiota, including gut endo-symbionts 19,20 , yeasts 21 , and ecto-symbionts such as ophiostomatoid fungi that the beetles inoculate into attacked trees 22 . Once established in the tree, these fungi may serve as food for developing beetles, metabolise conifer chemical defences, and accelerate tree death [23][24][25] .
Although beetles (Coleoptera) constitute the largest order of the Metazoa, only 11 Coleoptera genomes have been published, of which only two belong to members of the 6,000 species strong Scolytinae subfamily (the mountain pine beetle, Dendroctonus ponderosae Hopkins 26 , and the coffee berry borer, Hypothenemus hampei Ferrari 27 ). Sequencing of additional Scolytinae genomes would enable comparative genomic studies and shed light on taxon-specific ecological adaptations in this highly specialized and important insect group. The main aim of this study was to assemble and annotate a high-quality genome of I. typographus. Secondly, we performed a comparative genomic analysis with 20 other metazoan genomes, specifically investigating Ips-and bark beetle-specific gene family expansions. We expect that our highquality assembly of the I. typographus genome will be an important resource for forthcoming fundamental and applied studies on this important insect that endangers entire forest landscapes (Fig. 1e).

Results and discussion
Genome sequencing, assembly, and completeness assessment. A highly inbred line of I. typographus was successfully produced via full-sib mating for 10 generations. Whole-genome sequencing yielded over 4.5 million subreads totalling more than 50 gigabases of sequence equivalent to 211-fold coverage. The subread N 50 was 19.3 kb with a mean read length of 12.4 kb ( Supplementary  Fig. 1). Estimation of genome size using a k-mer-counting approach suggested a total haploid size of 221.9 Mb (Fig. 2), comparable with the mountain pine beetle (D. ponderosae) genome assembly of 204 Mb 26 . Flow cytometry has been performed on I. typographus previously to estimate genome size using both male and female beetles (n = 6). The result equates to an average genome size of 259 and 273 Mb for males and females, respectively, corroborating the results from our k-mer analysis (personal communication, J. Spencer Johnston and Anthony Cognato). The karyotype of I. typographus is 14 + Xy p 28 with the male being the heterogametic sex. Our final, phased genome sequence was 236.8 Mb in total length and comprised 272 contigs with an N 50 of 6.65 Mb; the longest contig being 16.9 Mb in size (Table 1). Seventy-eight percent of the genome assembly was contained in 36 contigs that were all greater than 1 Mb in size (Supplementary Table 1). The five largest contigs were all greater than 10 Mb in length.
Telomeric motifs are regions of repetitive sequences of DNA that indicate the ends of chromosomes. These motifs were identified at the ends of eight different contigs (five forward strands, three reverse strands). Telomeric motifs were located at one end of the five largest contigs (Supplementary Table 2). The overall GC content was 35.21%. Approximately 28.2% of the genome contained repetitive sequences when masked using a custom library (Supplementary Table 3). This was slightly higher than in D. ponderosae (between 17% and 23%) 26 , though lower than in Tribolium castaneum (42%) 29 . Analysis of completeness of the draft genome using the BUSCO tool revealed that 1,649 (99.5%) of the 1,658 genes in the insecta_odb9 database could be identified as present either partial or complete. Only nine genes (0.5%) were considered missing from the assembly. The I.  typographus assembly statistics were of a quality comparable with the release of the enhanced T. castaneum genome sequence 30 and were considerably higher than most other Coleoptera genomes published to date. Thus, our evaluation indicates that the de novo assembly of the I. typographus genome is of high quality.
Comparisons of contig sizes with other species and the presence of multiple telomeric motifs suggest some of the largest contigs contained in this assembly are approaching chromosome scale.
Genome annotation. Our approach to annotate the I. typographus genome exploited extensive transcriptomic resources and numerous publicly available protein sequences using ab initio and homology-based predictors. The final set of models consisted of 23,923 protein-coding genes ( Table 2, Supplementary Table 4) with the majority (> 77%) assigned an annotation edit distance (AED) score of 0.5 or less (Supplementary Fig. 2). At least 84% (20,094) of the gene models were supported by alignments to our RNA-Seq read data. Searches of the annotated predicted protein sequences using BUSCOs from the insecta_odb9 libraries identified 1,584 (95.5%) of the 1,658 protein sequences in the Insecta dataset, with only 38 (2.3%) sequences missing (Supplementary Figure 3). Comparing the outcome of BUSCO searches with the published Scolytinae (D. ponderosae and H. hampei) gene models, the I. typographus predictions had similar levels of completeness as these and were also comparable with the higher-quality coleopteran genome assemblies, such as the two other woodfeeding beetles (Agrilus planipennis and Anoplophora glabripennis; Supplementary Fig. 3). Moreover, read alignment of RNA-Seq data to annotated regions of the draft genome resulted in an average of 83.7% of reads mapping concordantly in pairs to the gene space. An overall average of 93.6% mapped reads indicates that the majority of transcripts captured in our RNA-Seq data are represented in the I. typographus gene set. Of the 23,923 predicted protein sequences, 59% contained Pfam domains. The number of gene models for I. typographus was considerably higher than that reported for D. ponderosae (13,088) 26 , though comparable with the annotation of A. glabripennis 31 and H. hampei 27 with 22,035 and 19,222 predicted protein-coding genes, respectively. Taken together, these statistics suggest the production of a comprehensive set of gene models for I. typographus.
Orthology and comparative analysis with other Coleopteran genomes. Comparison of orthologous genes shared among three beetle species closely related to I. typographus, two bark beetles and the polyphagous cerambycid wood-borer A. glabripennis, showed a core set of 6,991 shared gene clusters and a unique set of 811 clusters specific to I. typographus (Fig. 3a). Gene ontology enrichment analysis of these Ips-specific gene clusters revealed nine enriched categories, including DNA and RNA binding, regulation, and signalling (Supplementary Table 5). A total of 22 gene families were significantly expanded in I. typographus when compared with the 11 other published coleopteran genomes and a selection of other model invertebrates using a protein family domain analysis approach (Supplementary Table 6). A selection (17) of these expanded gene families is shown in Fig. 3b (see also Supplementary Table 7). As the other five gene families were domain model variations overlapping the reported domains, they were omitted from the figure. Using this comprehensive approach, we were able to highlight gene families that appear distinctly relevant to the ecology of I. typographus and/or coniferfeeding bark beetles. A phylogenetic tree inferred from singlecopy orthologous gene families shows the phylogenetic position of I. typographus relative to the other coleopterans included in this study (Fig. 3c). An analysis of the orthologous gene groups using the CAFE approach detected 1,065 expanded and 2,218 contracted orthologous gene groups containing 6,980 genes. Given that the CAFE approach resulted in a large proportion of genes (almost a third of all predicted Ips genes) being identified as either expanded or contracted, we discuss the expanded gene families identified using the PFam domain analysis approach to highlight gene families of interest based on function.
Feeding on the stem bark of trees requires an ability to metabolize plant cell walls using endogenous enzymes or relying on the actions of associated microbiota. Notably, genes containing domains associated with plant cell wall degrading enzymes were significantly expanded in the I. typographus genome. Large expansions of plant cell wall degrading enzymes were previously reported in the bark beetle D. ponderosae and the cerambycid wood-borer A. glabripennis 26,31 . In particular, three families of glycosyl hydrolase (GH) enzymes were expanded in the coniferfeeding bark beetles I. typographus and D. ponderosae (Fig. 3b). Genes containing the Pfam domain GH48 (PF02011) were most abundant in these species, possessing eight (I. typographus) and six (D. ponderosae) proteins containing this domain. GH48 enzymes are reducing, end-acting cellobiohydrolases and have been identified in several polyphagous insect species, but chiefly among the two coleopteran superfamilies Chrysomeloidea and Curculionoidea 32 . A detailed phylogenetic analysis of the Coleoptera and their GH gene repertoire has been reported previously 33 . The most abundant expression of these genes in I. typographus occurred predominantly in the fat body, with some genes being specifically expressed during different larval stages, but none in pupae (Fig. 4a). This finding is surprising since these genes typically are highly expressed in guts, and further experiments are needed to investigate functional roles of these proteins in the fat body. GH enzymes are also commonly found in bacteria and more rarely in fungi 34 and their presence in insects is thought to be the result of horizontal gene transfer events from bacteria 32,33,35,36 . Fungal symbionts assist I. typographus in colonising spruce trees and may play essential roles in beetle nutrition and weakening host tree defences 24,25 . GH48 enzymes are thought to also assist in the degradation of fungal chitin 37 , which may be important for I. typographus. Laboratory bioassays have shown a strong feeding preference of immature adults for media colonized by symbiotic fungi 24 . Three pairs of GH48 genes are found to occur adjacent to each other in the I. typographus genome. One of these pairs (Ityp12594 and Ityp12595) appear to be the result of a tandem duplication event ( Supplementary Fig. 4). Phylogenetic analysis of all the GH48 domain-containing genes in the 11 species of Coleoptera analysed in this study highlights that the eight genes from I. typographus share the highest similarity with those from the other Scolytinae species D. ponderosae and H. hampei (Fig. 4b).
The I. typographus genome contains 33 aspartyl protease (PF13650) domain-containing genes. Although these genes are also expanded in the wood-borer A. glabripennis, the number found in I. typographus is higher than in any of the other species included in this study, and clearly distinguishes I. typographus from D. ponderosae that has only a single gene containing this domain. Aspartyl protease is a digestive protease and has been implicated in proteolytic digestion in insects 38 . Both pectinesterase (PF01095) and polysaccharide lyase (PF14683) gene families were enriched in the three bark beetles. GH28 genes (PF00295) were also expanded in the bark beetles, but also in the wood-borer A. glabripennis and especially in the emerald ash borer, A. planipennis (Buprestidae). GH28 gene families are known to be involved in the degradation of pectin, a major component of primary plant cell walls. In several beetle species (D. ponderosae, H. hampei, A. glabripennis, and L. decemlineata) there is evidence suggesting a fungal origin of the GH28 genes 39 , references therein. Expansions of genes involved in cell wall degradation may extend the capacity of these species to digest their recalcitrant host plants efficiently. Searches of protein families within 12 species of Coleoptera revealed differences in the repertoire of genes important for detoxification and pesticide resistance. Interestingly, comparison between I. typographus and D. ponderosae revealed many similarities in gene family abundance (Table 3). Considering the well-known copious chemical defence by oleoresin of living conifer stems when invaded by bark beetles 40,41 , one would hypothesise that bark beetles would have an increased number of genes involved in xenobiotic metabolism when compared to other agriculturally important coleopterans. In contrast, the two coniferinfesting bark beetles did not display an increased number and neither did the angiosperm-feeding H. hampei (Table 3). For the nine other coleopterans, the mean total Pfam number was 528 ± 126 (9), while the mean of the three bark beetles was 327 ± 38 (3), a difference of −200 (giving a strong Hedges g standardized effect size value of −1.6 with a 95% C.I. =−3.1 to −0.2 42 ). In particular, the number of cytochrome P450 genes in the I. typographus genome (86 genes) is lower than in all the other species included in our analysis, except for H. hampei (67 genes). However, the number of P450 genes in D. ponderosae is only marginally higher than in I. typographus (93 genes), and many of these genes were shown to form species-specific expansions mainly within the CYP6 and CYP9 subfamilies 26 . Hence, it is possible that Scolytinae specialists on conifers detoxify host defences (such as abundant monoterpenes and phenolics) using a smaller, but specialised, repertoire of P450 enzymes, as compared to generalist beetles feeding on diverse angiosperms. Conifers in the pine family (Pinaceae), in particular Pinus spp., are rich in preformed terpenes, and conifer specialists might therefore be expected to have numerous P450s to detoxify these. However, I. typographus seems to be less tolerant to terpenes than Dendroctonus spp. 43 . This may appear paradoxical considering its lifestyle, but I. typographus may have a strategy of quickly reducing the levels of tree defences by mass-attack and assistance from fungal symbionts 44 . Correspondingly, Norway spruce trees have less preformed defences and rely more on their capability for induced response for their resistance against bark beetle attacks 41 .

Conclusions
We assembled and annotated the genome of the Eurasian spruce bark beetle I. typographus, an important pest of spruce across the Palearctic. This highly contiguous, phased assembly comprises 236.8 Mb of sequence in 272 contigs with an N 50 of 6.65 Mb and contains 23,923 annotated genes. Our analyses of this assembly suggest gene family expansions of primarily plant cell wall degrading enzymes, of which aspartyl protease domaincontaining genes stand out as particularly expanded. P450 enzyme-encoding genes involved in hormone synthesis and Table 3 Comparison of protein families associated with detoxification and pesticide resistance between Ips typographus and 11 other Coleoptera species. The number of proteins containing the specific Pfam domain for each family is given. catabolism of toxins are reduced in numbers. This whole-genome sequence from the genus Ips provides timely resources for population genetic studies and studies addressing important questions about the evolutionary biology and ecology of the true weevils. The genome is also expected to facilitate both fundamental and applied studies of functional genomics, as well as gene knockdown/knockout experiments. The essential odorant receptors are one example of a gene family that could be targeted to limit outbreaks, since interfering with the beetles' odour-guided host and mate finding behaviours likely will reduce their reproductive success 15,45 . Hence, this genomic resource may serve as a basis for improved and innovative management of an increasing threat to stressed trees in the Anthropocene.

Methods
Insect rearing and nucleic acid extraction. To reduce heterozygosity in the beetles before PacBio long-read sequencing, we ran 10 generations of strict sibling mating. The first mating pairs were taken from a laboratory-reared population of I. typographus originating from Lardal, Norway. The continuous culture was kept at the Swedish University of Agricultural Sciences in Alnarp, Sweden on local, freshly cut Norway spruce, following the procedures described by Anderbrant and coworkers 46 . Sexes were separated by pronotum hair density 47 and each mating pair had a 28 × 12-14 cm (length × diameter) spruce bolt providing food ad libitum 46 .
For each new sibling mating generation, we used siblings from offspring from one or occasionally two bolts (strictly separated), ensuring strict sibling mating. To reduce the risk of breeding failure, the 10 successful sibling mating generations were undertaken by using at least 10 spruce bolts per generation, with little or no observed inbreeding depression effects on fecundity or body mass over time. High molecular weight DNA was extracted from around 100 adult males from this 10× inbred population to be used for high-throughput sequencing. Beetles were directly frozen in liquid nitrogen and pulverised using a pre-chilled mini-mortar and pestle. The resulting powder was transferred with a pre-chilled spatula to a microfuge tube containing a modified lysis buffer (20 mM EDTA, 100 mM NaCl, 1% Triton® X-100, 500 mM Guanidine-HCl, 10 mM Tris, pH 7.9) and ball bearings ("stainless steel beads", Qiagen). The sample was further homogenised using a Qiagen Tis-sueLyzer by shaking for 1 min at 50 Hz. The optimal method for isolation of good quality high molecular weight DNA from I. typographus was using the Qiagen Genomic Tip 100/G kit following the manufacturers' instructions, except for a prolonged incubation time at 50°C overnight with gentle agitation using the modified lysis buffer described above instead of the standard Qiagen G2 buffer, and with the addition of Proteinase K to 0.8 mg/ml. This was supplemented with the addition of DNase-free RNase A (20 µg/ml) treatment and incubation for 30 min at 37°C, followed by centrifugation for 20 min at 12,000 × g to pellet insoluble debris. The clarified lysate was then transferred to the QBT buffer-equilibrated Genomic Tip to proceed with the standard protocol.
Genome sequencing, assembly, and evaluation. DNA samples were transported to the sequencing facility at the Uppsala Genome Center, Science for Life Laboratory, Uppsala University, Sweden for library preparation and sequencing. A pooled DNA sample from the 10 th generation inbred population was used to generate SMRT cell libraries that were sequenced on the PacBio RS II platform (Pacific Biosciences). DNA from a single non-inbred male beetle from the continuous laboratory culture described above was used to generate a 10X chromium library that was sequenced using the HiSeq X system (Illumina) to produce shortread data for a genome survey (364 million 150 bp reads,~230-fold genome coverage). Primary assembly of the PacBio long reads was performed using FALCON-kit v1.3.0 software and haplotype phasing was achieved using FALCONunzip v1.2.0 with default settings 48 . Contigs from the final assembly were aligned against each other using MUMmer 49 and redundant contigs removed. The completeness of the genome assembly was assessed using metrics derived from read alignment ( Supplementary Fig. 5) and searches of single-copy orthologs. The Benchmarking Universal Single-Copy Orthologs (BUSCO v3.0.2) 50 tool was used to search the genome assembly against the insecta_odb9 database of 1,658 genes. In order to obtain an understanding of the chromosomal structure of long contigs, telomeric motifs were identified using the script FindTelomeres.py (https:// github.com/JanaSperschneider/FindTelomeres). Genome size estimation was performed using data from the short-read genome survey and k-mer counting with Jellyfish v2.3.0 51 using GenomeScope 2.0 52 .
Transcriptome assembly and quantification of gene expression. Transcriptome libraries from different I. typographus life stages and tissue types (larval stages 1-3, pupae, adult beetle, male head, female head, callow male gut and fat body, callow female gut and fat body) were prepared at the Czech University of Life Sciences, Prague (Supplementary Table 8). All beetle samples were taken directly from logs obtained from Rouchovany, Czech Republic. Beetles were starved overnight before dissection or preservation for downstream processing. Specifically, whole beetles in different life stages (larvae, pupae, adults) were snap-frozen in liquid N 2 , and specific tissues (gut, head, fat body) were dissected under sterile conditions and kept in RNAlater™ until RNA extraction. Total RNA from whole larvae, pupae and adult I. typographus and the different tissues (pooled from several adults) was purified using the PureLink™ RNA Kit from Ambion (Invitrogen) following the manufacturer's protocol. The integrity of the purified total RNA was checked using agarose gel electrophoresis and the 2100 Bioanalyzer system (Agilent Technologies, Inc). Total RNA samples with an integrity number (RIN) > 7 were selected for sequencing. Total RNA was quantified using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific) with an RNA HS Assay Kit (Invitrogen) before sending for sequencing at Novogene, China. After the initial quality control, mRNA from eukaryotic organisms was enriched using oligo(dT) beads and cDNA libraires were prepared using the NEB Next® Ultra™ RNA Library Prep Kit and sequenced on the Illumina platform to generate a minimum of 30 million 150 bp paired-end reads per sample. Five biological replicates for each sample type were sequenced. Raw reads were processed using Trimmomatic v0.36 53 and assembled using Trinity v2.8.2 54 with the default parameters except for --normalize_max_read_cov 100 --min_kmer_cov 3 --min_glue 3 --KMER_SIZE 23. Expression levels were measured by aligning quality-processed RNA-Seq reads to the genome assembly using HiSat2 v2.1.0. Aligned reads were sorted with SAMtools v1.5 55 [64][65][66] . Protein sequences from I. typographus, D. ponderosae, T. castaneum, H. hampei and A. glabripennis were compared for orthology using all-against-all alignments (E-value of 10 −5 ) and clustered using OrthoVenn2 67 with an inflation value of 1.5. The unique set of 811 Ips-specific gene clusters identified from this comparison were assigned gene ontology terms using the Uniprot database and gene ontology term enrichment was computed using a hypergeometric distribution in OrthoVenn2. Phylogenetic inference from orthologous genes was undertaken by comparing the I. typographus gene set with the protein-coding sequences from 11 other species of Coleoptera (Supplementary Table 6) using OrthoFinder v2.4.0 68 . Using this package, single copy orthologous gene groups were used to build an ultrametric maximum likelihood-rooted species-tree from multiple sequence alignments using the inbuilt MAFFT and FastTree options. The resulting tree was then visualised using FigTree v1.4.4 (https://github.com/rambaut/figtree/releases).
We used CAFE v4.2.1 69 using the default P-value thresholds to examine the expansion and contraction of the orthologous gene groups identified with OrthoFinder.
Statistics and reproducibility. Presence of genes associated with detoxification and pesticide resistance in sequenced species of Coleoptera were identified based on Pfam domains contained within their respective protein sequences. For comparison of the mean Pfam numbers of the three scolytines and nine other Coleoptera we calculated Hedges g standardized effect size, which compares a pair of means by scaling their difference by division of their pooled standard deviations, using conventional methods 42,70,71 . Other statistical tests used in this study are described in the relevant methods section. All analyses presented in this study can be reproduced from the data deposited in publicly available databases as described in the data availability section.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JADDUH000000000. The version described in this paper is version JADDUH010000000. All sequence data relating to this study are available under the BioProject accession numbers PRJNA671615 and PRJNA679450. Accompanying data, including source data used to generated figures, is permanently available from Figshare https://doi.org/10.6084/m9.figshare.14503065. All other data are available from the corresponding author on reasonable request.