Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of the tick, Ixodes scapularis (Say), which vectors pathogens that cause Lyme disease, human granulocytic anaplasmosis, babesiosis and other diseases. The large genome reflects accumulation of repetitive DNA, new lineages of retro-transposons, and gene architecture patterns resembling ancient metazoans rather than pancrustaceans. Annotation of scaffolds representing ∼57% of the genome, reveals 20,486 protein-coding genes and expansions of gene families associated with tick–host interactions. We report insights from genome analyses into parasitic processes unique to ticks, including host ‘questing’, prolonged feeding, cuticle synthesis, blood meal concentration, novel methods of haemoglobin digestion, haem detoxification, vitellogenesis and prolonged off-host survival. We identify proteins associated with the agent of human granulocytic anaplasmosis, an emerging disease, and the encephalitis-causing Langat virus, and a population structure correlated to life-history traits and transmission of the Lyme disease agent.
Ticks (subphylum Chelicerata: suborder Ixodida) are notorious ectoparasites and vectors of human and animal pathogens, transmitting a greater diversity of infectious agents than any other group of blood-feeding arthropods. Ticks are responsible for serious physical damage to the host, including blood loss and toxicosis. Tick-borne diseases result in significant morbidity and thousands of human and animal deaths annually. The genus Ixodes includes multiple species of medical and veterinary importance, most notably serving as vectors of Lyme borreliosis in North America, Europe and Asia. Lyme disease is the most prevalent vector-borne disease in the northern hemisphere1. In the USA, 22,014 confirmed human cases were reported in 2012 (ref. 2), with ∼10-fold more infections suspected3. In Europe, ∼65,500 Lyme borreliosis patients are documented annually4. In the USA, Ixodes scapularis also vectors the infectious agents that cause human babesiosis, human granulocytic anaplasmosis, tick-borne relapsing fever and Powassan encephalitis. The increased incidence and distribution of Lyme disease and other tick-borne diseases5 necessitates new approaches for vector control.
Subphyla Chelicerata (includes ticks and mites) and Mandibulata (includes insects) shared a common ancestor 543–526 million years ago (Myr ago)6. Tick life cycles differ in many aspects from those of insects (Fig. 1) and include long periods of host attachment and blood feeding, as well as months living off-host without feeding. ‘Three-host’ ticks such as I. scapularis require a host blood meal at each life stage. Feeding occurs over several days and involves a period of slow feeding followed, after mating and insemination, by rapid consumption of a large blood meal. The synthesis of flexible new cuticle is a unique feature that permits the engorgement of ixodid ticks during feeding7. Moulting occurs off-host, and the subsequent developmental stage will ‘quest’ for a new host from vegetation. I. scapularis exhibits a wide host range including small, ground-dwelling vertebrates, birds, white-tailed deer and humans.
The I. scapularis genome assembly is the first for a medically important acarine species. It affords opportunities for comparative evolutionary analyses between disease vectors from diverse arthropod lineages and serves as a resource for the exploration of how ticks parasitize and transmit pathogens to their vertebrate hosts.
The first genome assembly for a tick vector of disease
The assembly, IscaW1, comprises 570,640 contigs in 369,495 scaffolds (N50=51,551 bp) representing 1.8 Gbp, including gaps (Table 1, Supplementary Table 2). The ab initio annotation of 18,385 scaffolds >10 Kbp in length and representing 1.2 Gbp (57% of the genome) predicted 20,486 protein-coding genes, and 4,439 non-coding RNA genes (Supplementary Figs 1–6 and Supplementary Table 3). Ixodid ticks typically have haploid genomes that exceed 1 Gbp (ref. 8). In contrast, the 90 Mbp genome of the two-spotted spider mite, Tetranychus urticae, a horticultural pest, is the smallest of any known arthropod, and contains <10% transposable elements9. Repetitive DNA is estimated to comprise ∼70% of the I. scapularis genome10, reflecting an extreme case of tandem repeat and transposable element accumulation.
The I. scapularis genome possesses 26 acrocentric autosomes and two sex chromosomes (XX:XY)11,12. Fluorescent in situ hybridization (FISH)-based physical mapping was used to develop a karyotype and physical map12 (Fig. 2; Supplementary Tables 12 and 15). Mapping revealed that tandem repeat accumulation in centromeric or peri-centromeric regions, also noted in some other arthropods13, is high in I. scapularis and comprises ∼40% of genomic DNA10. The low complexity tandem repeat families, ISR-1, ISR-2 and ISR-3, account for ∼8% of the genome12 (Supplementary Text). The most abundant ISR-2 (95–99 bp; ∼7% of the genome) is localized at the near-terminal heterochromatic regions of the chromosomes (Fig. 2).
The moderately repetitive fraction of the genome (∼30% of genomic DNA10) contains numerous copies of Class I and Class II transposable elements (Supplementary Tables 13 and 14 and Supplementary Text). For example, 41 well-represented elements (that is, comprising a full-length canonical and/or consensus sequence (Supplementary Figs 7 and 8)) of the long-terminal repeat (LTR) retro-transposon family, estimated to make up <1% of the genome, were identified. Thirty-seven members of the Ty3/gypsy group were identified, with the remainder being Pao/Bel-like. Two (Mag and CsRn1) of the six well-known insect Ty3/gypsy lineages were confirmed in the tick and two new clades, Squirrel and Toxo, are likely specific to the subphylum Chelicerata (Supplementary Fig. 8). Structural characterization of elements belonging to these lineages revealed shared features that include the CCHC gag and GPY/F integrase domains, and two ORFs matching gag and pol. The LTRs possess the TG..CA pattern14 and their integration generates a duplication of 4 bp.
Non-LTR retro-transposons comprise about 6.5% of the genome. Sequence conservation and transposable element copy number suggest recent activity in the I. scapularis CR1, I and L2 clades; these elements are also abundant in birds, mammals and lizards, and the possibility of horizontal transposable element transmission warrants further investigation. The R2, RTE and LOA non-LTR retro-transposon clades found in mosquitoes and Drosophila were not identified in the tick. Seemingly intact mariner and piggyBac transposable elements were identified, indicating possible recent or active transposition, and 234 miniature inverted-repeat transposable elements (MITEs) were annotated. These MITEs range in copy number from 50 to 14,500 and occupy ∼5% of the genome. Collectively, these findings suggest a genome permissive to high repeat accumulation.
Approximately 60% of tick genes have recognizable orthologs in other arthropods, about half of which are maintained across representative species of the major arthropod lineages (Supplementary Fig. 9). Approximately 50% of the remaining genes have homologs and ∼1/5th of tick genes appear unique (T. urticae has a similar proportion of unique genes); these provide an important resource to understand tick-specific processes and develop highly selective interventions. Analysis of gene models and 20,901 tentative consensus sequences (the Gene Index Project; compbio.dfci.harvard.edu/tgi) compiled from 192,461 expressed sequence tags (ESTs) identified ∼22% of I. scapularis genes as paralogs (Supplementary Note 1 and Supplementary Table 11). This is in line with estimates for Homo sapiens (15%)15 and the nematode, Caenorhabditis elegans (20%)16. Complementary analyses of paralogs17 suggest two duplication events in I. scapularis, involving hundreds of genes that took place within the last 40 million years, consistent with the radiation of ticks through Europe, America and Africa. The tick mitochondrial genome retains the inferred ancestral arthropod organization as predicted by its phylogenetic position18 (Supplementary Fig. 10).
The genome-scale quantitative molecular species phylogeny (Supplementary Text) inferred from single-copy orthologs from OrthoDB19, confirms the expected position of Chelicerata as basal to crustaceans and insects (Fig. 3a). The rate of molecular evolution of I. scapularis genes is slightly slower than that of other representative arthropods, and considerably slower than the rapidly evolving dipterans. Quantification of shared intron positions (Fig. 3b) and lengths (Fig. 3c) among orthologs reveals that I. scapularis shares greater than 10 times more intron positions exclusively with the non-arthropod species compared with the crustacean Daphnia pulex (Supplementary Figs 11–14 and Supplementary Tables 7–10). The species tree topology is reconstructed using only intron presence/absence data, but its branch lengths reveal that I. scapularis intron positions are more similar to those of the outgroup species, than to the other arthropods. This distinction is underscored by the contrasting length distributions of shared introns; I. scapularis lengths are most similar to those of mouse and other vertebrates, and an order of magnitude greater than in D. pulex and the representative insect species analysed. Ancestral eukaryotic genes likely possessed high intron densities similar to those of modern mammals20. The tick genome, therefore, supports an intron-rich gene architecture at the base of the arthropod radiation and more similar to that of ancestral metazoans than extant pancrustaceans.
Ticks as parasites
Tick mouthparts (chelicerae and barbed hypostome) attach to and create a feeding lesion in the dermis of the host (Fig. 1b). Tick saliva consists of a complex mixture of peptides and other compounds that facilitate attachment and disarm host haemostasis, inflammation and immunity, thereby enabling prolonged blood feeding. Antimicrobials in the saliva21 presumably prevent bacterial overgrowth within the ingested blood and/or feeding lesion. Transcriptome analyses indicate that tick saliva is exceptionally diverse compared with that of haematophagous insects22. Also, genes encoding salivary gland products are evolving rapidly in comparison with other gene families, possibly due to the immune pressure imposed by the host. Notably, the genome reveals an expanded repertoire (74, 0.4% of the predicted proteome) of proteins containing a Kunitz domain (Supplementary Table 16), implicated in protease inhibition and channel-blocking activity, with roles in inhibiting coagulation, angiogenesis and vasodilation. The tick genome is the richest source of this gene family identified to date. In contrast, only 0.05% of human and 0.1% of bovine proteins have this signature domain23, while the mosquito vectors Aedes aegypti, Culex quinquefasciatus and Anopheles gambiae have only five, eight and four proteins with this domain, respectively. Other tick gene expansions of note include the lipocalins (40 genes), linked to anti-inflammatory activity in other systems24, and the metalloproteases (34 genes), which are involved in fibrin degradation and inhibition of angiogenesis25
Ticks have evolved a novel mechanism for haemoglobin digestion. Haemolysis of host erythrocytes occurs in the midgut but the digestion of blood meal proteins takes place within specialized vesicles of midgut epithelial cells following internalization by pinocytosis (Fig. 1d). Haemoglobin digestion occurs via a cascade of proteolytic enzymes resulting in dipeptides and free amino acids that are transcytosed into the haemolymph (Supplementary Text and Supplementary Table 21). Orthologs of Ixodes ricinus haemoglobinolytic enzymes26 were identified in the I. scapularis genome that contains multiple genes for cathepsin D (three genes), cathepsin L (three genes), and serine carboxypeptidase (four genes), suggesting the relative importance of these enzymes in haemoglobin digestion. Haemoglobinolytic enzymes have also been identified in other tick species27,28, suggesting that this mode of haemoglobin digestion is widespread throughout the Ixodida. Liberated haem is transported from the digestive vesicles by transport proteins to haemosomes, unique storage vesicles where haem is detoxified by formation of haematin-like aggregates29. Thus, haemoglobinolysis in ticks is similar to that in endoparasitic flatworms and nematodes. However, tick-specific intracellular digestion in midgut epithelial vesicles and haem detoxification in specialized haemosomes could offer novel acaricide targets (Supplementary Text and Supplementary Table 21).
Haem is associated with multiple essential functions as it complexes with proteins that perform oxygen transport and sensing, enzyme catalysis and electron transfer30. However, ticks are incapable of de novo haem synthesis, and it has been proposed that they rely on haem recovery from the diet31. The identification of orthologous genes in I. scapularis for the enzymes hemF, hemG and hemH associated with the production of protohaem (Supplementary Fig. 15 and Supplementary Table 20) suggests these may be remnants of a once functional haem synthesis pathway that became redundant following adaptation to a blood diet. In the absence of de novo synthesis, haem storage in ticks is likely essential, especially during the extended periods that occur between blood feeding and during egg development. In ticks, two families of storage proteins ensure haem availability and protect against the toxicity of a haem-rich diet: haemlipoglyco-carrier proteins (CPs) and the yolk proteins, vitellogenins (Vgs)32 (Fig. 1d). CPs are predominant in all tick developmental stages except the embryo. In contrast, Vg is produced in the fat body and midgut of adult females during vitellogenesis (Fig. 4), and is transported via the haemolymph to the developing oocyctes where it is stored as vitellin. Vitellin is the main protein in the egg and the likely source of haem for developing embryos33. Ten putative CP genes, the most described from a tick to date, and two Vg genes were identified in the I. scapularis genome (Supplementary Fig. 16 and Supplementary Table 22).
The genome contains orthologs for at least 39 invertebrate neuropeptide genes (Supplementary Tables 25–28), including peptides that regulate ecdysis, cuticle synthesis, hardening and tanning. Orthologs involved in insect moulting34, that is, corazonin, eclosion hormone, cardioactive peptide and buriscon α and β, were identified (Fig. 4). Additional novel putative neuropeptide genes were identified based on the presence of tandem repeats in conserved C-terminal sequences, including the canonical sequences for amidation and dibasic (or monobasic) cleavage signals (Supplementary Table 25). ESTs matching corazonin, eclosion hormone and bursicon α and β were found in the synganglion transcriptome of adult Dermacentor variabilis35, which do not moult, suggesting previously unrecognized roles for these neuropeptide hormones. Companion analyses36 identified major differences in gene expression between I. scapularis and the soft tick, Ornithodoros turicata (Argasidae) in response to feeding that may explain how synganglion neuropetides regulate different life styles of the two tick families. The identification of orthologs of neuropeptides known to regulate insect moulting provides a much needed starting point to understand the regulation of development in ticks and in the modification of cuticle to accommodate the approximately 100-fold increase in size that occurs during blood feeding (Fig. 4).
In ticks, over-hydration from large blood meals is counterbalanced by hormonally controlled salivary secretion into the host, presumably regulated by neuropeptides and their G-protein-coupled receptors (GPCRs) (Fig. 1c). The homologs of many insect neuropeptides, protein hormones, biogenic amines and associated GPCRs37 (Supplementary Tables 25–28) that steer processes such as diuresis, behaviour, reproduction and development38, were identified in I. scapularis. Some of the neuropeptide genes identified encode multiple neuropeptides. Of note is the extreme number of copies (19) of the kinin gene, which ranges from one to eight in other arthropods38 (Supplementary Table 28), suggesting that high peptide copy number is also needed for effective diuresis. In accordance, four kinin GPCRs are present (Supplementary Table 28). The tick has 20 GPCRs for five biogenic amines, a number similar to that for all other sequenced arthropods37, suggesting an early evolutionary origin of these molecules and a core set of highly conserved arthropod signalling molecules. Typically in insects, each neuropeptide interacts with one, or at most two, GPCRs37. Remarkably, the numbers of some neuropeptide GPCRs have expanded significantly (up to 10-fold) in I. scapularis (Supplementary Tables 26 and 28). This includes the GPCRs for AKH/corazonin-related peptide, allatostatin-A, diuretic hormones (calcitonin- and CRF-like), inotocin, kinin, pigment-dispersing-factor, sulfakinin, and tachykinin (Supplementary Table 28)37. In insects, these GPCRs are involved in regulating meal size (kinin), satiety (sulfakinin) and diuresis (kinin, tachykinin and calcitonin-like diuretic hormone)38. In ticks, the increased efficacy and fine regulation of diuresis may be accomplished through an increased repertoire of diuretic GPCRs rather than via corresponding neuropeptides, emphasizing their potential as targets for tick control.
Blood feeding is essential for reproduction in adult female ticks (Fig. 4). In lower insects, reproduction is largely regulated by juvenile hormone III. Biochemical evidence suggests that ticks do not synthesize juvenile hormone III and instead employ ecdysteroids to initiate vitellogenesis (Fig. 4, reviewed in33). In insects, the final hydroxylations for the synthesis of ecdysteroids are performed sequentially by cytochrome P450s (CYP450s) encoded by the Halloween genes (Supplementary Fig. 17 and Supplementary Table 19). Genes for all steroidogenic CYP450s except for phantom were identified in the I. scapularis genome and putative gene duplications were identified for disembodied and the spook/spookier clades, suggesting conservation of ecdysteroid regulated processes between ticks and insects. Genes for seven of the nine enzymes in the insect mevalonate pathway that produces the juvenile hormone precursor, farnesyl-pyrophosphate (farnesyl-PP), were identified in the tick genome (Supplementary Fig. 18 and Supplementary Table 18). There are five insect enzymes involved in the conversion of farnesyl-PP to juvenile hormone III. Only the gene for farnesol oxidase in the juvenile hormone branch was found in the I. scapularis genome (Supplementary Table 18) and is transcribed in the synganglion of I. scapularis and D. variabilis. The tick genome reveals a striking expansion of the methyl transferase family (44 genes) and EST data indicate that at least 26 of these are transcribed (Supplementary Fig. 19). However, the I. scapularis methyl transferases studied so far lack the juvenile hormone binding motif. An ortholog of the insect cytochrome P450 (CYP15A1) that adds the epoxide to methyl farnesoate to produce juvenile hormone III was not found in either the tick genome (Supplementary Table 18) or synganglion transcriptomes. The neuropeptides, allatostatin and allatotropin, which perform a variety of functions in insects, including the regulation of juvenile hormone biosynthesis, were also identified in the tick (Fig. 4). Important questions remain as to the role of the mevalonate-farnesal pathway in tick reproduction and development. In a complementary study, transcripts for genes in the mevalonate-farnesal pathway were identified from the synganglion of two hard and one soft tick species39.
The I. scapularis genome reflects a parasitic lifestyle requiring detoxification of multiple xenobiotic factors (Fig. 1a). We identified a record 206 CYP450 (Supplementary Table 23) and 75 carboxylesterase/cholinesterase-like genes, including five putative acetylcholinesterase genes (Supplementary Table 24). CYPs are haem-containing enzymes that catalyse biological oxidation reactions, many of which detoxify xenobiotics, including acaricides. In contrast, the body louse, Pediculus humanus, also an obligate blood-feeding ectoparasite, has 36 CYPs, the fewest known in an animal40, while the plant feeding mite, T. urticae has 81 (ref. 9). Carboxylesterases are also associated with metabolic detoxification in animals. While the function of these enzymes is not known, the abundance of these genes in I. scapularis may reflect the need to detoxify large blood meals from diverse hosts and toxicants encountered during off-host stages.
As a parasite that lives largely off-host, I. scapularis has developed unique mechanisms for host detection that are reflected in the genome (Fig. 1a). The sensory system in ticks includes setiform sensilla for chemo-, mechano-, thermo- and hygroreception, non-setal sensilla and dorsal light-sensing cells. Chemoreception occurs presumably through the unique Haller’s organ located on the tarsi that are presented when ticks ‘quest’ for a host. In insects, smell and taste are mediated by families of membrane receptors and extracellular ligand-binding proteins41. The chemoreceptor genes identified in the tick genome belong to the gustatory receptor and ionotropic glutamate receptor (iGluR)-related ionotropic receptor families. Sixty-two gustatory receptors were identified that fall into three major clades (Supplementary Fig. 20, Supplementary Table 29 and Supplementary Note 1). The largest of the clades (43 genes) is exclusive to I. scapularis and the relatively short branch lengths compared with those for other representative species, suggest a recent lineage-specific expansion. Although phylogenetically distant, this clade is related to the Dipteran sugar receptors and a set of three distinctive D. pulex gustatory receptors42. The second clade includes 16 tick gustatory receptors, also more closely related to the sugar receptors than to other representative gustatory receptors, with branch lengths suggesting an early diversification. The remaining clade (three genes) clusters with the largest D. pulex expansion. Of the 29 IR/iGluR genes identified, 15 are likely of the chemosensory type (ionotropic receptor) and 14 are canonical iGluRs (Supplementary Fig. 21 and Supplementary Tables 30 and 31). Members of the insect odorant receptor, odorant-binding protein (OBP) and chemosensory protein B families43 were not identified in the tick and only one member of the chemosensory protein (CSP) family was found. Our analysis supports the hypothesis that the origin of insect odorant receptors and OBPs occurred after the split of the lineages Hexapoda and Crustacea (∼470 Myr ago)42,44; the CSPs, however, are predicted to appear before the split of the Chelicerata and Pancrustacea lineages. Phylogenetic analyses indicate that odorant receptors belong to a divergent lineage originated from gustatory receptors, while OBPs could have derived from a CSP-like ancestor44. Both events may have occurred concomitantly as an adaptation of ancestral hexapods to the terrestrial environment (380–450 Myr ago). Chelicerate olfaction may, therefore, rely exclusively on ionotropic receptors, which are expressed in olfactory organs across Protostomia45, although it is also possible that some gustatory receptors have been recruited to this sensory function, as in Drosophila melanogaster46. Comparative transcriptomics has identified putative GPCRs, ionotropic receptors, odorant turnover enzymes and other transcripts specific to the Haller’s organ in ticks47. Evidence suggests the potential involvement of female specific cuticular lipids and a non-volatile mounting pheromone in I. scapularis during mating48. These data and morphological studies provide an emerging model for research on tick chemical communication and new control methods.
The tick possesses a small repertoire of photon-sensitive receptors compared with most insects. Genes for three opsin GPCRs were identified (Fig. 1a, Supplementary Table 26) and include orthologs of the insect putative long-wavelength sensitive ‘visual’ opsins, the honey bee ‘non-visual’ pteropsin likely involved in extraocular light detection and regulation of circadian rhythm49, as well as the D. melanogaster Rh7 opsin50. Orthologs of the insect UV and short wavelength receptors were not identified. This indicates a reduced visual system as compared with other blood-feeding arthropods (Supplementary Text) that rely heavily on visual processes during flight for location of mates, hosts and oviposition sites. During host detection, olfactory, mechano- and thermoreception may offset limited visual acuity and wavelength detection in the tick.
Ticks as vectors of pathogens and parasites
Ticks are biological vectors of viruses, bacteria and protozoa that are typically acquired via the blood meal and transmitted through saliva during feeding (Fig. 5). The tick immune system has several mechanisms to fend off pathogen invasion. Most components of the Toll, IMD (Immunodeficiency), JAK-STAT (Janus Kinase/Signal Transducers and Activators of Transcription) immune pathways and the RNA interference-antiviral signalling pathways were identified in the tick genome (Supplementary Figs 22 and 23 and Supplementary Table 17). The repertoire of immunity-related genes also includes akirins, antimicrobial peptides, caspases, defensins, oxidases, the fibrinogen-related protein family of ixoderins, lysozymes, thio-ester containing proteins and peptidoglycan-recognition proteins (Supplementary Table 17).
Multiple infection factors facilitate transmission of the Lyme disease pathogen, Borrelia burgdorferi (Fig. 5). These include the tick salivary gland proteins Salp15, Salp20, Salp25D, tick salivary lectin pathway inhibitor and tick histamine-release factor, as well as the tick receptor for OspA and tick protein tre31, and the Borrelia lipoprotein BBE31 (ref. 51). Increasingly, research is focused on interactions with Anaplasma phagocytophilum (Rickettsiales: Anaplasmataceae), the causative agent of human granulocytic anaplasmosis prevalent in the USA and Europe52. The I. scapularis proteins P11, SALP16, α1, 3-fucosyltransferases and the X-linked inhibitor of apoptosis E3 ubiquitin ligase are required for A. phagocytophilum infection and transmission, and modification of the tick cytoskeleton by A. phagocytophilum increases infection53,54,55. To establish infection, A. phagocytophilum inhibits apoptosis in midgut and salivary gland cells through the JAK/STAT and intrinsic pathways56. In response, the extrinsic apoptosis pathway is induced in tick salivary glands. All known components of these pathways were identified in the tick with the exception of the Perforin ortholog (Supplementary Table 17). Systems biology analyses56 revealed that the generalized responses of tick cells to A. phagocytophilum infection include changes in protein processing in the endoplasmic reticulum and glucose metabolism. Protein misfolding is increased in infected tick cells, a possible strategy by which A. phagocytophilum evades the cellular response to infection. The subsequent activation of protein targeting and degradation, reduces endoplasmic reticulum stress and prevents cell apoptosis, and may also benefit the pathogen through provision of raw materials critical for an obligatory intracellular parasite with reduced biosynthetic and metabolic capacity57. In addition, A. phagocytophilum can induce an increase in expression of antifreeze glycoproteins, enhancing I. scapularis survival in cold temperatures58, and downregulate Porin expression to inhibit apoptosis, increasing tick colonization55,56. Tick cells respond to pathogen infection by decreasing glucose metabolism and increasing Subolesin and Heat Shock Protein expression, and limiting rickettsial infection59,60.
We used quantitative proteomics to further characterize tick–Anaplasma interactions, and identify differential protein expression in an I. scapularis ISE6 cell line in response to infection; 735 unique peptides assigned to 424 different I. scapularis proteins, were identified (Supplementary Tables 32–35). In total, 83 proteins were differentially represented (50 under- and 33 over-represented; Supplementary Fig. 24 and Supplementary Table 32). Under-represented (13) and over-represented (8) proteins were identified during early infection (11–17% infected cells at 3 days post-inoculation). Most were also represented as infection advanced when the number of under- and over-represented proteins increased to 50 and 31, respectively (56–61% infected cells; 10 days post-inoculation). Analysis of protein ontology demonstrated differences between under- and over-represented proteins in both early and late infections for cell growth (adducin, spectrin and β-tubulin) and transport (Na+/K+ ATPase, voltage-dependent anion-selective channel or mitochondrial porin and fatty acid-binding protein; Supplementary Tables 32–34).
The genome of a Rickettsia (Alphaproteobacteria: Rickettsiales) species, Rickettsia endosymbiont of Ixodes scapularis (REIS), was assembled from both bacterial artificial chromosome clones and recruited whole-genome shotgun reads (available at GenBank, NZ_ACLC00000000). Phylogenomics analysis of the REIS genome, which comprises a single 1.82 Mbp chromosome and four plasmids, indicates a novel non-pathogenic species that is ancestral to all Spotted Fever Group Rickettsia species, providing a valuable resource for understanding the evolution of symbiosis versus pathogenicity61.
Much less is known about the molecular mechanisms involved with viral interactions in ticks. Research suggests the RNA interference pathway provides an important defense against virus infection in tick cells, with a significant expansion of Ago genes in comparison with insects62. In a companion proteomics study of the I. scapularis ISE6 cell line following infection with the Langat virus63, 266 differentially expressed tick proteins were identified. Functional analyses suggest perturbations in transcription, translation and protein processing, carbohydrate and amino acid metabolism, transport and catabolism responses. The majority of differentially expressed proteins were downregulated, similar to the proteomics profile described above. Interestingly, 121 differentially expressed proteins lacked homology to known orthologs, suggesting these may be unique to I. scapularis.
Population structure of Ixodes scapularis in North America
The restriction-site-associated DNA sequencing (RADseq) technique was employed for genome-wide discovery of single-nucleotide polymorphisms (SNPs) and examination of genetic diversity within and among eight I. scapularis populations from the north-east, mid-west and south-east regions of the USA and the Wikel reference colony. F-statistics were used to assess genetic distance as evidence of selection. FIS values (range 0.003–0.012; Supplementary Table 36) suggest random mating or low levels of inbreeding among members comprising each population. Further supporting this hypothesis, among all populations, the average observed heterozygosity (Ho) per variable SNP was comparable (range 0.013–0.016) to expected heterozygosity (He) (range 0.013–0.018) and the nucleotide diversity over all SNP loci (π) (range 0.015–0.019) was comparable among samples. FST values (range 0.03–0.16; Supplementary Table 37), support a single species classification for I. scapularis across North America as previously reported64. Low-moderate genetic variation (FST=0.03–0.06) was observed among northern tick populations from Indiana, Maine, Massachusetts, New Hampshire and Wisconsin, and moderate variation (FST=0.07–0.09) among southern populations from Florida, North Carolina and Virginia. FST analyses revealed signatures of north–south structure in I. scapularis populations. Moderate-to-high genetic variation was observed between northern versus southern populations (FST=0.10–0.15). Interestingly, low genetic variation (FST=0.03–0.06) was observed between populations from the mid-west (Indiana and Wisconsin) versus the north-east (Maine, Massachusetts and New Hampshire), two areas associated with a high prevalence of human Lyme disease cases. As expected, moderate-to-high genetic variation was observed between the reference Wikel colony and field populations (FST=0.07–0.16).
The population structure of I. scapularis was separately analysed using a subset of representative SNPs. Membership probabilities, interpreted as proximities of individuals belonging to each cluster, revealed five clades (Fig. 6), with clear separation of the Wikel colony from field populations. Clustering of Indiana and New Hampshire, and Massachusetts, Maine and Wisconsin populations, indicates significant shared alleles, while the Virginia, Florida and North Carolina populations may share a small number of alleles. Interestingly, the population structure suggests a genetic component associated with differences in the natural history of northern and southern I. scapularis and a correlation to the prevalence of human Lyme disease cases. The incidence of Lyme disease is greatest in the upper mid-west and north-east where I. scapularis populations feed predominantly on deer as adults and complete the life cycle over 2 years. In contrast, southern populations exploit a wider range of vertebrate hosts and are not quiescent during winter64,65. These data provide important resources to determine the genetic basis of host preference and vector competence, and the correlation with Lyme disease transmission.
Genome-based interventions to control tick-borne disease
Prevailing methods of tick control rely heavily on the use of repellents and acaricides. Resistance to currently applied pesticides that disrupt neural signalling and tick development has prompted the search for novel targets. GPCRs represent a source of candidate targets for development of novel interventions. High-throughput target-based approaches have been employed to discover new mode-of-action chemistries that selectively inhibit the I. scapularis dopamine receptors66. The ligand-gated ion channels (LGICs) offer another rich source of targets. iGluRs play a major role in neurotransmission and chemosensory signalling within arthropods67. Twenty-nine putative iGluR genes and 32 putative cys-loop receptors were identified in the I. scapularis genome (Fig. 7, Supplementary Table 31). Among the iGluR genes, 14 encode members of the three principal subclasses of synaptic iGluRs (AMPA, Kainate and NMDA; Supplementary Fig. 21 and Supplementary Tables 30 and 31), while the remaining 15 more divergent sequences likely belong to the chemosensory ionotropic receptor subfamily (see above). The cys-loop LGIC family also contains six candidate glutamate-gated Cl− channels (GluCls), 12 nicotinic acetylcholine receptor subunits, and four GABA-gated chloride channels. One histamine-gated Cl− channel and one pH-gated Cl− channel gene were also identified. Both the iGluRs and cys-loop LGIC families contain tick-specific genes with no apparent insect ortholog. This striking divergence may contribute to the apparent ineffectiveness of some insecticides on acaricidal targets67. Classifying LGIC candidates by functional expression is underway and an example is shown for a GluCl (Fig. 7; Supplementary Fig. 25). Selective targeting of tick LGICs and GPCRs may offer routes to new, safe and effective acaricides.
The genome sequence of I. scapularis, the first for a medically important chelicerate, offers insights into the molecular processes that underpin the remarkable parasitic lifestyle of the tick and its success as a vector of multiple disease-causing organisms. Foundational studies of genome organization and population structure will advance research to determine the genetic basis of tick phenotypes, and efforts are ongoing to discover novel chemistries that selectively disrupt molecular targets mined from the genome. This study is a pioneering project for genome research on ticks and mites of public health and veterinary importance, with efforts proposed to expand genomic resources across this phyletic group. In 2011, the National Institutes of Health approved the sequencing of additional species of hard ticks, including European and Asian Ixodes species, the soft tick Ornithodoros moubata (Family Argasidae) and the Leptotrombidium mite vector of scrub typhus (Superorder Acariformes)68 (Supplementary Table 38). The I. scapularis genome offers a roadmap for research on tick–host–pathogen interactions to achieve the goals of the One Health Initiative69 and improve human, animal and ecosystem health on a global scale.
Genome sequencing, assembly and annotation
The genome of I. scapularis Wikel strain was sequenced in a joint effort by the Broad Institute and the JCVI and funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health. The I. scapularis Wikel strain (Quinnipiac University, Hamden, CT) genome was sequenced to approximately 3.8-fold coverage using Sanger sequencing and assembled using the Celera Assembler configured to accommodate high repeat content within the genome and heterozygosity in the donor population (Supplementary Table 1). The assembly and raw reads are available at GenBank under the project accession ABJB010000000, consisting of contig accessions ABJB010000001-ABJB011141594 and VectorBase as IscaW1, 3 May 2012. The annotation of the I. scapularis genome was performed via a joint effort between the JCVI and VectorBase. The genome annotation release (IscaW1.4) is available at VectorBase (https://www.vectorbase.org/) and GenBank (accession ID: ABJB010000000). Forty-five bacterial artificial chromosome clones, ∼183,834 ESTs and 45 microRNAs were also sequenced and annotated (Supplementary Figs 4–6 and Supplementary Tables 4–6).
Proteomics of Ixodes-Anaplasma interactions
The I. scapularis ISE6 cells were inoculated with A. phagocytophilum (human NY18 isolate) or left uninfected. Uninfected and infected cultures (n=5 independent cultures each) were sampled at early infection (11–17% infected cells (Avg±s.d., 13±2)) and late infection (56–61% infected cells (Avg±s.d., 58±2)) and used for proteomics. Protein extracts from the four experimental conditions, control uninfected early, infected early, control uninfected late and infected late (100 μg each) were gel-concentrated, digested overnight at 37 °C with 60 ng μl−1 trypsin (Promega, Madison, WI, USA) and the resulting tryptic peptides from each proteome were extracted and iTRAQ labelled for the analysis. The samples were fractionated by isoelectric focusing and each fraction analysed by liquid chromatography-mass spectrometry/mass spectrometry (LC-MS/MS) using a Surveyor LC system coupled to a linear ion trap mass spectrometer model LTQ (Thermo Finnigan, San Jose, CA, USA) and protein identification was carried out using SEQUEST algorithm (Bioworks 3.2 package, Thermo Finnigan), allowing optional (Methionine oxidation) and fixed modifications (Cysteine carboxamidomethylation, Lysine and N-terminal modification of +144.1020 Da). The MS/MS raw files were searched against the alphaproteobacteria combined with the arachnida Swissprot database (Uniprot release 15.5, 7 July 2009) supplemented with porcine trypsin and human keratins. This joint database contains 638,408 protein sequences. False discovery rate of identification was controlled by searching the same collections of MS/MS spectra against inverted databases constructed from the same target databases. The alphaproteobacteria Swissprot database was used to identify Anaplasma and discard possible symbiotic bacterial sequences from further analyses.
Ixodes scapularis genetic diversity and population structure
74 RADseq libraries were produced from female I. scapularis representing nine ‘populations’ from the states of Florida, Indiana, Maine, Massachusetts, North Carolina, New Hampshire, Virginia and Wisconsin and the Wikel reference colony. RADseq libraries were constructed using 1 μg genomic DNA from individual ticks, separately digested with the SbfI restriction enzyme. Adaptor ligated libraries were pooled and sequenced at the Purdue Genomics Core Facility on the Illumina HiSeq 2500 in Rapid run mode. Further analysis was performed by the Bioinformatics Core at Purdue University. Illumina reads were corrected for restriction site, clustered and de-multiplexed (sorted by barcode) using the ‘process_radtags.pl’ script of STACKS. For SNP identification, reads from each sample were separately aligned to the IscaW1 assembly using the end-to-end mode and default parameters of Bowtie2 v 2.1.0. Genetic diversity within and between I. scapularis populations was calculated using 745,760 SNPs across 35,460 polymorphic loci. F-statistics were used to assess genetic distance or differentiation as evidence of selection where FIS is the inbreeding coefficient of an individual (I) relative to the subpopulation (S) and FST is the difference in allele frequency between subpopulations (S) compared with the total population (T). The population structure of I. scapularis across North America was separately analysed using a subset of 34,693 representative SNPs (1 SNP per polymorphic locus). The ‘population’ step from STACKS was used to analyse genetic diversity and fastStructure (beta release) was used to analyse population structure. Detailed methods are available in Supplementary Text. All variation data are available at NCBI SRA (SRP065406), VectorBase and via BioMart: http://biomart.vectorbase.org.
Functional expression of tick LGICs
Expression studies were performed on mature oocytes extracted from anaesthetised female Xenopus laevis. Briefly, complementary RNA encoding IscaGluCl1 was injected at 1 mg ml−1 using a Drummond Nanoject injector into oocytes that had been treated for 20–40 min in a 2 mg ml−1 solution of collagenase type 1A (Sigma UK) in calcium-free saline. Following 3–5 days incubation at 18 °C in saline supplemented with penicillin (100 units per ml), streptomycin (100 μg ml−1), gentamycin (50 μg ml−1) and 2.5 mM sodium pyruvate, oocytes were secured individually in a Perspex chamber (∼90 μl) and perfused continually in saline at 5 ml min−1. They were impaled by two glass microelectrodes filled with 3 M KCl (resistance 1–5 MOhm in saline), with which the oocytes were voltage clamped at −100 mV using an Axoclamp 2A amplifier. Solutions were applied in the perfusing saline. The saline consisted of (in mM): NaCl 100, KCl 2, CaCl2 1.8, MgCl2 1, HEPES 5, adjusted to pH 7.6 with 10 M NaOH.
Accession codes: The data reported in this paper are archived at GenBank under the project accession ABJB010000000, consisting of contig accessions ABJB010000001-ABJB011141594, and at VectorBase (IscaW1, 3 May 2012). The genome annotation release (IscaW1.4) is available at GenBank (accession ID: ABJB010000000) and VectorBase (https://www.vectorbase.org/) and RADseq data have been deposited in the NCBI Sequence Read Archive (SRA) under accession code SRP065406.
How to cite this article: Gulia-Nuss, M. et al. Genomic insights into the Ixodes scapularis tick vector of Lyme disease. Nat. Commun. 7:10507 doi: 10.1038/ncomms10507 (2016).
This project has been funded in part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services (NIAID, NIH, DHHS) under contract numbers N01-AI30071, HHSN272200900007C, HHSN266200400001C and 5R01GM77117-5. Its contents are solely the responsibility of the authors and do not represent the official views of the NIH. Additional grants and contracts supporting work described in this manuscript were from the NIH-NIAID (HHSN266200400039C and HHSN272200900039C) to F.H.C., and a subcontract under HHSN272200900039C to C.A.H. and J.M.M., the Australian Research Council Discovery Project (DP120100240) to S.C.B. and R.S., the Ministerio de Ciencia e Innovación of Spain (BFU2007–6292; BFU2010–15484) to J.R., BIO2009–07990 and BIO2012–37926 to J.V. NIH-1R01AI090062 to Y.P., L.S., and J.K., NIH 1R21AI096268 and NSF IOS-0949194 to R.M.R., the Xunta de Galicia of Spain (10PXIB918057PR) to J.M.C.T. and M.T., BFU2011–23896 and EU FP7 ANTIGONE (278976) to J.F., the USDA-NRI/CREES (2008-35302-18820) and Texas AgriLife Research Vector Biology grant to P.V.P. and European Research Council Starting Independent Researcher Grant (205202) to R.B., J.M.R was supported by the intramural program of the NIAID, R.M.W. by a Marie Curie International Outgoing Fellowship PIOF-GA-2011–303312, E.M.Z. by Swiss National Science Foundation awards 31003A-125350 and 31003A-143936, J.M.G. by an NIH-NCATS award TL1 TR000162 and NSF Graduate Research Fellowship (DGE 1333468), V.C. by a Boehringer Ingelheim Ph.D. Fellowship, F.G.V. by a Fundação para a Ciência e a Tecnologia, Portugal fellowship (SFRH/BD/22360/2005), C.J.P.G. and F.H. by The Lundbeck Foundation (Denmark), and J.J.G. by NIH awards HHSN272200900040C, R01AI017828 and R01AI043006. Support from the Broad Genomics Platform is gratefully acknowledged.
Supplementary Figures 1-25, Supplementary Tables 1-38, Supplementary Note 1, Supplementary Methods and Supplementary References
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/