Main

Soil-transmitted helminths (STHs), including Ascaris, Trichuris and hookworms, cause neglected tropical diseases affecting >1 billion people worldwide1,2. Hookworms alone infect approximately 700 million people, primarily in disadvantaged communities in tropical and subtropical regions, causing a disease burden of 1.5–22.1 million disability-adjusted life years3. N. americanus represents 85% of all hookworm infections4 and causes necatoriasis, characterized clinically by anemia, malnutrition in pregnant women, and an impairment of cognitive and/or physical development in children5.

The life cycle of N. americanus commences with eggs being shed in the feces of infected people. Eggs embryonate in soil under favorable conditions, and then the first-stage larvae hatch, feed on environmental microbes and molt twice to become infective third-stage larvae (iL3). These larvae infect the human host by skin penetration, enter subcutaneous blood and lymph vessels, and travel via the circulation to the lungs. The iL3 break into the alveoli and migrate via the trachea to the oropharynx, after which they are swallowed and travel to the small intestine, where they develop to become dioecious adults. The adult worms (1 cm long) attach to the mucosa, where they feed on blood (up to 30 μl per day per worm), and can survive in the human host for up to a decade. The pre-patent period of N. americanus is 4–8 weeks, and a female worm can produce up to 10,000 eggs per day.

New methods to control hookworm disease are urgently needed. Present therapy relies mainly on mass treatment with albendazole6, but repeated and excessive use of this agent has the potential to lead to treatment failures7 and drug resistance8. Recent indications of reduced cure rates in infected humans9 imply an urgent need for new intervention strategies. Early attempts to use bioinformatic approaches for the discovery of immunogens were hampered by a lack of understanding of the molecular biology of N. americanus and other hookworms4 and by the absence of genome and proteome sequences. A recent study10 has shown that comparative genomics facilitates the characterization and prioritization of anthelmintic targets, which results in a higher hit rate than conventional approaches.

In addition to a need for anti-hookworm vaccines in countries with high rates of hookworm infections, hookworms and other helminths are being explored as treatments (probiotics) against immunological diseases in humans in many industrialized countries where hookworm infections are not endemic11. Recent studies12,13,14 indicate that hookworms suppress the production of pro-inflammatory molecules and promote anti-inflammatory and wound-healing properties, suggesting a mechanism by which worms reside for long periods in humans and suppress autoimmune and allergic diseases. Indeed, hookworm recombinant proteins have been tested in clinical trials for noninfectious diseases15.

We sequenced, assembled and characterized the N. americanus genome and compared it with those of other nematodes and the human host. Bioinformatic analyses of the protein-coding genes identified salient molecular groups, some of which may represent new intervention targets. The production and screening of a hookworm protein microarray revealed previously undescribed features of the immune response to the parasite and enabled a postgenomic exploration of the genome sequence. In the postgenomic analysis, we identified molecules that have low similarity to proteins in other species but are recognized by all infected individuals and therefore have high diagnostic potential.

Results

Genome features

The nuclear genome of N. americanus (244 megabases (Mb)) was assembled, with 11.4% (1,336) of the supercontigs (≥1 kb) comprising 90% of the genome. The 244-Mb sequence was estimated to represent 92% of the N. americanus genome (Table 1, Supplementary Figs. 1, 2, 3 and Supplementary Note). The GC content was 40.2%, the amino acid composition was comparable to that of other species (including five nematodes, the host and two outgroups; Supplementary Table 1) and the repeat content was 23.5%. In total, 669 repeat families were predicted and annotated (Supplementary Table 2 and Supplementary Note). The protein-encoding genes predicted (n = 19,151) represent 33.7% of the genome at an average density of 78.5 genes per Mb and a GC content of 45.8%.

Table 1 Summary of N. americanus genomic features

Compared to those of Caenorhabditis elegans, N. americanus exons were shorter and the introns were longer (Fig. 1a), but the average intron length and count for genes orthologous between the two species was not significantly different (P = 0.65 and 0.69, respectively; Fig. 1a,b and Supplementary Note). However, introns in C. elegans genes that were orthologous to N. americanus genes were significantly longer than introns in nonorthologous C. elegans genes (P < 1 × 10−15; Fig. 1c). This may indicate a diversity of function for these genes, as longer introns are thought to contain functional elements in addition to what might be regarded as 'normal' intron structure16. Furthermore, N. americanus iL3-overexpressed genes had longer introns than adult-overexpressed genes (Fig. 1b), which may indicate a greater diversity of regulation for these gene sets16. Positional bias was observed for intron length, which was comparable to C. elegans position-specific intron lengths for orthologous genes (Fig. 1c and Supplementary Note).

Figure 1: Organization of N. americanus gene features compared to C. elegans.
figure 1

(a) The average exon in N. americanus genes is significantly (P < 1 × 10−10) shorter and the average intron is significantly (P < 1 × 10−10) longer than in C. elegans genes. (b) Orthologous (orth.) genes have significantly (P < 1 × 10−10) more introns than nonorthologous genes in both species. (c) In orthologous genes from C. elegans, introns are longer at every intron position compared to nonorthologous genes. In ac, error bars indicate s.e.m. (d) N. americanus genes that are in operons and conserved with C. elegans are shown on the C. elegans chromosomes.

Most genes (82.6%) were confirmed using RNA sequencing (RNA-seq) data from the iL3 and adult stages of N. americanus (two biological replicates per stage), and 6.5% and 3.7% were overexpressed in these stages, respectively (Supplementary Figs. 4 and 5, and Supplementary Table 3). Alternative splicing was detected for 24.6% (4,712) of the genes, of which 68.3% have orthologs in C. elegans. Among N. americanus genes with C. elegans orthologs, the alternatively spliced genes were more likely than other genes to belong to orthologous groups for which more than half of the C. elegans genes were also alternatively spliced (P = 0.037, binomial distribution test). As expected, genes associated with alternative splicing had a higher number of exons than those without (P < 10−15 and 2 × 10−7 for N. americanus and C. elegans, respectively). A total of 3,223 N. americanus genes were predicted to be trans-spliced, of which 818 had conserved gene order and orientation with 373 C. elegans operons (Fig. 1d, Supplementary Figs. 6 and 7, Supplementary Table 4 and Supplementary Note). The expression profiles of genes within operons were significantly more similar to one another than to those of random subsets of non-operon genes (P < 0.0001), supporting the idea that they are co-transcribed under similar regulatory control17.

The N. americanus predicted secretome (classical secretion, 1,590 proteins; nonclassical secretion, 4,785 proteins) represented 33% of the deduced proteome. Functional annotation of predicted proteins on the basis of sequence comparisons identified 4,961 unique domains and 1,411 Gene Ontology terms for 57% and 44% of the N. americanus genes, respectively, and annotations were provided for 68% of the predicted N. americanus proteins (Supplementary Table 5).

Transcript expression in infective and parasitic stages

Hookworms spend a considerable amount of time as free-living larvae in the external environment before transitioning to parasitism. Differences in gene expression between these stages reflect this developmental progression (Supplementary Table 3 and Supplementary Fig. 5). Of the 1,948 differentially expressed genes, 36% were significantly overexpressed (according to EdgeR, q = 0.05) in iL3, and 64% in adult. Compared to iL3-overexpressed genes, nearly twice as many of the adult-overexpressed genes were specific to N. americanus (58% compared to 32%, P < 10−15), suggesting that species-specific genes are more likely to be related to parasitism rather than to the nonparasitic iL3 stage18.

Among the iL3-overexpressed genes, eight molecular functions were over-represented (P < 0.01), including signal transduction, transmembrane receptor activity and anion transporter activity, reflecting the ability of iL3 to adapt to a complex environment and infect a suitable host (Fig. 2a, Supplementary Table 6 and Supplementary Note). This finding is supported by the enrichment of genes encoding G protein–coupled receptor proteins among iL3-overexpressed genes (P = 5.1 × 10−8) but not among adult-overexpressed genes (P = 4.1 × 10−7) (Supplementary Fig. 8). Consistent with observations in other parasitic nematodes19, serine/threonine protein kinase activity was also enriched among iL3-overexpressed genes (P = 0.008). The complexity of transcription regulatory activities is likely to be high in iL3, as evidenced by the enrichment of genes annotated with “sequence-specific DNA binding transcription factor activity” (GO:0003700; P = 1.7 × 10−14) and genes with alternative splicing (P < 2 × 10−13), and by the fact that most (92.5%) of the differentially expressed transcription factors were iL3 overexpressed (Supplementary Note). This iL3-stage enrichment of transcription factor–related activity might indicate that transcription factors are poised for rapid gene expression after host invasion (that is, gene expression is not active but is likely to be primed, as observed in arrested stages of C. elegans20).

Figure 2: Molecular functions enriched among N. americanus genes, stage-enriched genes and the N. americanus degradome.
figure 2

(a) 'Molecular function' gene ontology terms enriched in specific life-cycle stages and in N. americanus compared to other species. Included are (i) categories enriched in the iL3 or adult life cycle stages in N. americanus, (ii) categories significantly (P ≤ 1 × 10−5) over-represented or depleted in N. americanus compared to at least two of the comparison species, and (iii) second-order root nodes. TF, transcription factor. (b,c) Expression profiling of N. americanus proteases with C. elegans orthologs (b) or with no C. elegans orthologs (c).

In contrast, in the adult stage, we detected overexpression of transcripts for a broad spectrum of enzymes including proteases, hydrolases and catalases (Supplementary Table 6). This reflects the nutritional adaptation of adult worms to a high-protein diet of blood21 (Fig. 2, Supplementary Fig. 9 and Supplementary Note). Proteins with a signal peptide (SP) for secretion had transcripts that were enriched among adult-overexpressed genes (P < 10−15), whereas transmembrane domain–containing proteins (P = 1.2 × 10−8) had transcripts enriched among iL3-overexpressed genes. Proteases and protease inhibitors were enriched among SP-containing genes, and proteases contributed substantially to the predicted secretome (Supplementary Table 6 and Supplementary Note), with 55% of all proteases (325 of 592) predicted to be secreted. Proteases, particularly N. americanus–specific proteases with no orthologs in C. elegans, were overexpressed more often in adult than in iL3 (P < 10−15 for both comparisons; Fig. 2b,c, Supplementary Note and Supplementary Table 7). Serine-type endopeptidase inhibitor activity, required to protect the adult stage from the digestive and immunologically hostile environment in the host22, was adult enriched (P = 1.6 × 10−4). The adult enrichment of genes encoding structural constituents of the cuticle (P = 1.7 × 10−5) also relates to protecting the parasite from the host23.

Blood feeding in adult hookworms is facilitated by an anticoagulation process and degradation of blood proteins by proteases. Known hookworm anticoagulants24 are dominated by single-domain serine protease inhibitors (SPIs). We annotated 87 SPIs in N. americanus, accounting for 8 of 17 protease inhibitor clans. Given that serine proteases in humans are involved in diverse physiological functions, including blood coagulation and immunomodulation, the diversity of SPIs in N. americanus is probably crucial not only for anticoagulation during blood feeding but also for long-term survival in the host. Specifically, SPIs are likely to protect adult worms from enzymes in the small intestine, where serine proteases, including trypsin, chymotrypsin and elastase, are prominent25, thus mediating hookworm-associated growth delay22. SPIs were enriched among the adult-overexpressed genes (P = 3.9 × 10−8), but not among the iL3-overexpressed genes (P = 0.35). Most of the SPIs characterized in hookworms were Kunitz-type molecules (Supplementary Note), but our findings suggest that multiple types of SPIs are produced by adult N. americanus in the human host. A mass spectrometry–based proteomics analysis was performed using whole adult N. americanus worms (Online Methods), and the proteins detected (Supplementary Table 7 and Supplementary Fig. 10) were also enriched for proteases (P = 4.9 × 10−7) and SPIs (P = 1.8 × 10−4), as well as proteins with SPs (P = 4.7 × 10−11) and proteins representing a wide range of Gene Ontology terms, many related to proteolysis (Supplementary Table 6 and Supplementary Note).

Pathogenesis and immunobiology of hookworm disease

N. americanus causes chronic disease and does not usually induce sterile immunity in the host. Adult hookworms are able to live in the host for several years because of their ability to modulate and evade host immune defenses13 with their excretory-secretory products, which sustain development and create a site of immune privilege26. By comparing the N. americanus genome with genomes from other nematodes, its host and distant species, we identified molecules that facilitate parasitism. Sixty percent of N. americanus genes had an ortholog in the other species studied (Supplementary Table 8, Supplementary Fig. 11 and Supplementary Note). Comparative analysis identified metalloendopeptidases as the most prominent N. americanus proteases (Fig. 2a); these proteases are probably associated with the cleavage of eotaxin and inhibition of eosinophil recruitment27, in addition to tissue penetration28 and hemoglobinolysis29. N. americanus is the only blood-feeding nematode included in the comparison, and the hierarchical structure for enriched molecular functions (Fig. 2a) revealed shared and unique patterns and subsequent functional relationships.

SCP/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS; InterPro IPR014044; Supplementary Table 5) is a protein family inferred to be involved in host-parasite interactions (Supplementary Note). There were 137 SCP/TAPS proteins in N. americanus, representing a fourfold expansion of this protein family compared to other nematodes. More than half (69 of 137) of the N. americanus SCP/TAPS proteins were adult overexpressed (P < 10−15; Fig. 3a), and only 6 of the 137 N. americanus SCP/TAPS proteins had orthologs in C. elegans (according to Markov clustering (MCL); see Online Methods). The presence of a limited repertoire of orthologs in C. elegans suggests that nematode SCP/TAPS proteins may have originated before parasitism. Primary sequence similarity classified SCP/TAPS proteins into multiple groups (Fig. 3b,c and Supplementary Fig. 12), only some of which contained C. elegans members, suggesting independent expansion of SCP/TAPS proteins after parasite speciation. The large expansion of SCP/TAPS proteins in N. americanus suggests multiple, possibly distinct roles in host-parasite interactions. SCP/TAPS proteins have been studied extensively as hookworm drug or vaccine candidates30 and as therapeutics for human inflammatory diseases15 or stroke31 (Supplementary Note). The 96 N. americanus–specific SCP/TAPS identified here might serve as candidates for selective drug or vaccine targets32 (Supplementary Table 5).

Figure 3: SCP/TAPS gene family expansion in N. americanus.
figure 3

(a) SCPs/TAPS are enriched in the adult stage of N. americanus. (b) Schematic representation of gene structure from SCP/TAPS family members. All SCP/TAPS proteins are grouped according to their number of CAP domains and regions outside the CAP domains: single CAP domain, double CAP domain, single CAP plus miscellaneous, and double CAP plus miscellaneous. (c) Neighbor-joining clustering of the all C. elegans and ungapped N. americanus SCP/TAPS genes on the basis of their full-length primary sequence similarity of the CAP domain. Data on domain representation, secretion type and stage of expression are included.

We identified a total of 336 N. americanus genes that are orthologous to previously predicted genes encoding immunogenic/immunomodulatory proteins in Ascaris suum24, along with three homologs of genes encoding transforming growth factor-β (TGF-β), an important protein in modulation of inflammation and the evolution of nematode parasitism33 (Supplementary Table 5). Additional genes in N. americanus encoding proteins inferred to be involved in host-parasite immunomodulatory interactions include macrophage migration inhibitory factor (MIF), neutrophil inhibitory factor (NIF), hookworm platelet inhibitor (HPI), galectins, C-type lectins (C-TL), peroxiredoxins (PRX) and glutathione S-transferases (GST), among others (Supplementary Note).

Prospects for new interventions

Historically, anthelmintic drugs have been discovered using in vivo and in vitro compound screens34. Recent comparative 'omics' studies (accompanied by experimental screening) in multiple nematode species10 have shown that genomic and transcriptomic data can be used to prioritize targets and raise the hit rate compared with conventional approaches. Hence, the availability of the N. americanus genome is expected to enable comparative genomic and chemogenomic studies for the prediction and prioritization of therapeutic targets. As more than half (53%) of all current drug targets35 consist of rhodopsin-like G protein–coupled receptors (GPCRs), nuclear receptors (NRs), ligand-gated ion channels (LGICs), kinases and voltage-gated ion channels (VGICs), we investigated these protein groups in the N. americanus genome to identify potential therapeutic targets (Supplementary Table 9 and Supplementary Note).

GPCRs are attractive drug targets owing to their importance in signal transduction35. We identified 272 GPCR genes in N. americanus, whereas there are nearly 1,700 GPCR genes in C. elegans. Although GPCRs are challenging to characterize at the primary sequence level and the N. americanus genome is in a draft state, there may be a biological explanation for this difference in the number of GPCRs identified, including frequent amplifications of several subfamilies of GPCRs in C. elegans relative to the closely related Caenorhabditis briggsae36. Three of the five GRAFS families of GPCRs (glutamate, rhodopsin and frizzled, but not adhesion or secretin) were found in N. americanus. The putative GPCRs were enriched for iL3 overexpression (30 genes; P = 5.1 × 10−8), with only one gene being adult overexpressed (P = 4.1 × 10−7 for under-representation). N. americanus encodes members of both major ion-channel categories (LGICs and VGICs); 224 LGICs belonging to two of the three subfamilies of LGIC (Cys-loop family and glutamate-activated cation channels) were identified, compared with 159 LGIC-encoding genes in C. elegans37. Genes encoding nicotinic acetylcholine receptor subunits (nAChR) of Cys-loop family members were also found. Nematodes have a much larger number of nAChR-α subunits than examined vertebrates (17 nAChR-encoding genes in mammals and birds, compared with 29 nAChR subunits in C. elegans38), and several anthelmintics such as levamisole39 and monepantel40 have been developed to exploit these differences. Ivermectin41 targets a subunit of glutamate-gated chloride channels that are present in N. americanus (eight genes; InterPro IPR015680); three of these genes clustered with six C. elegans glutamate-gated chloride channel genes (avr-14, avr-15 and glc-1 to glc-4; ref. 42). The lack of a clear ortholog of the ivermectin-sensitive genes within the N. americanus genome, and the underlying sequence diversity at a position correlated with direct activation by ivermectin, may explain the relative ivermectin insensitivity of N. americanus43 (Supplementary Note and Supplementary Fig. 13) compared to other nematodes44.

VGICs include sodium, potassium and calcium channels and are anthelmintic targets (for example, emodepside inhibits SLO-1 in C. elegans45 and parasitic nematodes such as A. suum46). N. americanus encodes 48 VGICs (fewer than C. elegans), including members from the major families such as 6-transmembrane (6TM) potassium channels, voltage-gated calcium channels and voltage-gated chloride channels (Supplementary Note). As in other nematodes47, voltage-gated sodium channels were not present in N. americanus.

Protein kinases are involved in numerous signal transduction pathways that regulate biological processes, and they have been a major focus for drug discovery48,49. Of the 274 N. americanus genes encoding kinases, 15 and 12 were overexpressed in iL3 and adults, respectively. Gene expression, tissue expression, conservation among nematodes and dissimilarity to human orthologs were used for prioritization10 of candidate targets (Supplementary Table 10). To evaluate current drugs and inhibitors that target homologous kinases, we also prioritized compounds from a publicly available database (Online Methods). The highest-scoring compound was a tyrosine kinase inhibitor approved for treating chronic myelogenous leukemia9. A total of 233 other compounds had the second-highest score of 5 (Supplementary Table 11), indicating that these existing drugs might be repurposed for treating neglected tropical diseases, thus minimizing development time and cost50.

Chokepoints in metabolic pathways51 were analyzed and prioritized to identify further drug targets. N. americanus encodes at least 3,976 protein-coding genes associated with 3,265 KEGG orthology terms (Supplementary Table 7), 938 (24%) of which are involved in metabolic pathways (Supplementary Fig. 14), representing 32 potentially complete modules. A total of 34% of the metabolic pathway genes were classified as chokepoints (Supplementary Table 12), of which 120 were conserved among nematodes and non-nematode species used in the comparative analysis. Chokepoint prioritization, along with requirements for a chokepoint to be an expression bottleneck in N. americanus and for lethality upon RNAi knockdown of the orthologous gene in C. elegans, prioritized eight enzymes encoded by ten distinct genes (Supplementary Tables 12, 13, 14 and Supplementary Note). Among the prioritized chokepoints is adenylosuccinate lyase (ASL; EC 4.3.2.2) (Supplementary Figs. 15, 16, 17), an enzyme involved in the purine metabolism pathway (KEGG pathway ko00230) and a chokepoint in the adenine ribonucleotide biosynthesis module (KEGG pathway M00049). To identify chokepoint inhibitors for repurposing, we assessed compounds from publicly available databases (449 target-compound pairs) using the same method as for kinase inhibitors. The highest-ranked candidates include compounds such as azathioprine (DrugBank DB00993), a prodrug that is converted into mercaptopurine (DrugBank DB01033) to inhibit purine metabolism and DNA synthesis (Supplementary Fig. 18, Supplementary Table 14 and Supplementary Note).

Postgenomic exploration using the N. americanus immunome

The N. americanus genome enables development of postgenomic tools to investigate the immunobiology of human hookworm disease and accelerate antigen discovery for the development of vaccines and diagnostics. We developed a protein microarray containing 564 N. americanus recombinant proteins inferred from the genome (Supplementary Table 15 and Supplementary Note). The microarray was probed with sera from individuals aged 4–66 years who were residents in an N. americanus–endemic area of northeastern Minas Gerais state in Brazil. This pilot study based on 200 individuals from the youngest (<14 years of age) and the oldest (>45 years of age) age strata identified 22 antigens that were significant (P ≤ 0.05) targets of anti-hookworm immune responses (Fig. 4).

Figure 4: Serum responses to N. americanus antigens vary with age and infection intensity.
figure 4

The heat map shows the immunoreactivity of 22 antigens to the IgG antibodies from groups of uninfected individuals, infected children <14 years old and infected adults >45 years old (n = 8 in each group). Duplicate crude somatic extracts from iL3 and adult stages were included as control naive antigens. Every other row represents an individual recombinant in vitro translation product. The bar chart shows the mean immunoreactivity of the three groups for each antigen, measured by mean fluorescence intensity. 'iL3' and 'Adult' labels denote stage-specific expression of indicated antigens, measured by RNA-seq data. Significant differences in antibody responses between human adults and children were detected with Student's t-test. *P < 0.05; **P < 0.01; ***P < 0.001; NS, no significant difference.

Older individuals showed stronger immunoglobin G (IgG) responses to a larger number of secreted antigens, but these antibodies seem to have no role in killing the parasite or protecting against heavy infection. Hence, unlike other STHs of humans, protective immunity to N. americanus does not seem to develop in most individuals during adolescence. This is consistent with observations that, in Necator-endemic areas, older people often harbor the heaviest-intensity infections1,52,53. Younger individuals showed IgG responses against fewer antigens, usually with lower intensity. Thus, although antibodies are a key feature of the immune response to N. americanus and increase with host age, they do not protect individuals from infection over time.

There are probably multiple factors contributing to the absence of overall protective immunity to hookworm infection, in contrast to the age-acquired protective immunity observed with other STH infections. Detailed kinetic studies of the IgG subclasses and IgE responses to hookworm antigens represented on our protein microarray will be required to better understand the roles of these antibodies in the acquisition of immunity against hookworm13. The protein microarray can be probed with sera from individuals with different genetic backgrounds and different histories of exposure to hookworm54, as well as from animals rendered immunologically resistant to hookworm infection by vaccination with irradiated iL3 (ref. 55), thereby facilitating efforts to develop an efficacious vaccine against hookworm disease. Furthermore, secreted proteins that are recognized by most or all the infected individuals, and have weak or no homology to other nematode species, represent antigens that might form the basis of sensitive and specific serodiagnostic tests (Supplementary Note; for example, Supplementary Fig. 19).

Discussion

N. americanus is responsible for causing more disease worldwide than any other STH. The characterization of the first genome of a human hookworm is expected to facilitate future fundamental explorations of the epidemiology and evolutionary biology of hookworms as well as efforts toward the development of therapeutics to combat hookworm disease. As N. americanus is the first hookworm whose genome has been sequenced, the data presented here provide a first insight into blood-feeding nematodes of major importance for human and animal health.

Our postgenomic exploration of inferred proteomic information highlights the utility of the draft genome sequence for understanding the immunobiology of human hookworm disease and accelerating the development of vaccines and diagnostics. It is also pertinent to note that hookworms are garnering interest for their therapeutic properties against a range of noninfectious inflammatory diseases of humans. The genome sequence therefore represents a veritable pharmacopoeia—indeed, recombinant hookworm molecules have already undergone clinical trials for stroke and deep-vein thrombosis15. Thus, the N. americanus genome sequence will have broad implications. It provides many opportunities to establish postgenomic methods in the quest to develop improved interventions against this ancient and neglected parasite, as well as inflammatory diseases that are reaching epidemic proportions in industrialized societies.

Methods

Parasite material.

The Anhui strain of N. americanus was maintained56 in Golden Syrian Hamster (3–4 weeks, male) from Harlan under the George Washington University Institutional Animal Care and Use Committee–approved protocol 053-12.2, and in accordance with all Animal Welfare guidance. Adult worms were collected from intestines of hamsters infected subcutaneously with N. americanus iL3 for 8 weeks57. DNA was extracted with the QIAamp DNA Mini Kit according to manufacturer's instruction (Qiagen). For transcriptome sequencing, two key developmental stages from a host-parasite interaction perspective, the infective L3 (iL3; environmental) and adult (parasitic) worm stages, were collected.

Sequencing, assembly and annotation.

Fragment, paired-end whole-genome shotgun libraries (3 kb and 8 kb insert sizes) were sequenced using Roche/454 platform and assembled with Newbler58. A repeat library was generated (RepeatModeler) and repeats characterized (CENSOR59 v. 4.2.27 against RepBase release 17.03 (ref. 60)). Ribosomal RNA genes (RNAmmer61) and transfer RNAs (tRNAscan-SE62) were identified. Other noncoding RNAs were identified by a sequence homology search against the Rfam database63. Repeats and predicted RNAs were then masked using RepeatMasker. Protein-coding genes were predicted using a combination of ab initio programs64,65 and the annotation pipeline tool MAKER66. A consensus high-confidence gene set from the above prediction algorithms was generated (Supplementary Note). The size and number of exons and introns in N. americanus were determined by parsing exon sizes from gff-format annotations (considering only exon features tagged as “coding_exon”) and calculating intron sizes. These were then compared to the C. elegans genes (WS230). Significant differences in exon and intron lengths and numbers were tested between species and orthologous and nonorthologous gene groups using two-tailed t-tests with unequal variance (Supplementary Note). Two separate approaches were used to identify putative operons in N. americanus (Supplementary Note). Gene product naming was determined by BER (JCVI) and functional categories of deduced proteins were assigned67,68,69. Orthologous groups were built from 13 species using OrthoMCL70, and genes not orthologous to the other 12 species were classified as N. americanus specific.

RNA sequencing.

RNA was extracted18, DNase treated and used to generate both Roche/454 and Illumina cDNA libraries (Supplementary Note) that were sequenced using a Genome Sequencer Titanium FLX (Roche Diagnostics) and Illumina (Illumina, San Diego, CA), with slight modifications (Supplementary Note). The 454 cDNA reads were analyzed as previously described18. The Illumina RNA-seq data were processed71 and low-compositional complexity bases were masked72. RNA-seq reads were aligned73 to the predicted gene set and genes with a breadth of coverage ≥50% across the gene sequence (i.e., “expressed”) were used for further downstream analysis. Expression was quantified using expression values normalized to the depth of coverage per 100 million mapped bases (DCPM). Expressed genes were subject to further differential expression analysis using EdgeR74 (false discovery rate <0.05) in order to identify stage-overexpressed genes (Supplementary Note).

Deduced proteome functional annotation and enrichment.

Proteins were searched against KEGG75 using KAAS68 (cut-off 35 bits), and InterProScan69 was used to get InterPro76 domain matches and Gene Ontology67 (GO) annotations. Proteins with signal peptides77, nonclassical secretion78 and transmembrane topology77 were identified. The degradome was identified by comparison to the MEROPS79 protease unit database using WU-BLAST (identifying the best hit with E ≤ e−10). Enrichment of different protease groups among different gene sets (based on similarity to C. elegans) was detected based on false discovery rate (FDR)-corrected binomial distribution probability tests80. GO enrichment significance comparing the iL3 and adult-overexpressed gene sets was calculated using FUNC81 at a 0.01 significance threshold after Family-Wise Error Rate (FWER) population correction81. QuickGO82 was used to analyze the hierarchical structure of significant GO categories.

Proteomic analysis of somatic worm extract.

Whole worms were ground under liquid nitrogen before solubilization in lysis buffer, total protein was precipitated, and established methods83 were used to reduce, alkylate and tryptic-digest two 1.5 mg samples of total somatic protein. Peptide fractions were prepared before LC and mass spectral analysis (Supplementary Note). Only proteins confirmed with at least two peptides and a confidence of P ≤ 0.05 were considered identified. GO functional enrichment among the genes supported by proteomics was calculated81, using all of the genes without proteomics support as a background for comparison.

Transcription factors and binding sites.

Transcription factors in N. americanus were identified by extracting KEGG Orthology (KO) numbers from the KEGG transcription factor database (derived from TRANSFAC 7.0 (ref. 84)) and comparing to N. americanus KOs. Documented matrices of transcription factor binding sites were downloaded from the JASPAR database85. The corresponding protein accession numbers were extracted and converted to KOs, and were compared to N. americanus transcription factor KOs to define a subset of N. americanus transcription factors with available binding site information. The binding site matrices of this subset of N. americanus transcription factors were used to scan the sequences of up to 500 bp downstream and upstream of differentially expressed genes using Patser.

SCP/TAPS.

Each protein was searched for the SCP/TAPS-representative protein domains86 IPR014044 (“CAP domain”) and PF00188 (“CAP”)86 using Interproscan69 and hmmpfam87. Phylogenetic relationship trees using full length primary sequences derived from ungapped genes were built using Bayesian inference88 and neighbor joining89 as previously described for other helminths32,86,90. Leaves of the tree were annotated with domain information, secretion mode and expression data, and then visualized using iTOL91.

Potential drug targets.

GPCRs, LGICs and VGICs were identified with InterProScan69. Ion channels were identified using WU-BLASTP (E ≤ e−10) against the C. elegans proteome (WS230). For ivermectin target characterization, sequence alignments were obtained by MUSCLE92 for the C. elegans and N. americanus orthologs within two orthologous groups (NAIF1.5_00184 and NAIF1.5_06724). Homology models for the two N. americanus orthologs (NECAME_16744 and NECAME_16780) were built by MODELLER93 using the C. elegans crystal structure as template94. For each ortholog, five models were built and the one with the lowest total function score (energy) was chosen as the model shown. Sequence alignments are colored by Clustalx scheme in JalView95; protein structure models are rendered in PyMOL (Schrodinger, The PyMOL Molecular Graphics System, Version 1.3r1. 2010).

Kinome and chokepoints.

N. americanus genes were screened against the collection of kinase domain models in the Kinomer96, and custom score thresholds were applied for each kinase group and then adjusted until an hmmpfam search87 came as close as possible to identifying known C. elegans kinases. Those same cutoffs were then applied to the N. americanus gene set to identify putative kinases as previously described97. Kinase prioritization was done by adapting the protocol previously described10 (Supplementary Note).

Chokepoints of KEGG metabolic pathways were defined as a reaction that either consumes a unique substrate or produces a unique product. The reaction database from KEGG v58 (ref. 98) was used and the chokepoints were identified and prioritized as previously described99 (Supplementary Note). Metabolic module abundances were calculated (and normalized in DCPM) based on KAAS annotations68, and module bottlenecks were defined as reaction steps in the cascade that both are essential for the module completion and have low enzyme abundance that primarily constrains the overall module abundance. Homology models were aligned with their reference sequence using T-COFFEE100, constructed with MODELLER101 using default parameters and PDB structures with the highest sequence similarity, and docking was performed using AutoDock4.2 (ref. 102) using default parameters. Chemogenomic screening for compound prioritization was performed as previously described99 (Supplementary Note).

Protein microarray.

In 2005, 1494 individuals between the ages 4 and 66 years (inclusive) were enrolled (with informed consent) into a cross-sectional study in an N. americanus-endemic area of Northeastern Minas Gerais state in Brazil, using protocols approved by the George Washington University Institutional Review Board (117040 and 060605), the Ethics Committee of Instituto René Rachou and the National Ethics Committee of Brazil (CONEP; protocol numbers 04/2008 and 12/2006). Venous blood (15 mL) was collected from individuals determined to be positive for N. americanus (Supplementary Note).

A total of 1,275 N. americanus open reading frames (ORFs) contained a classical signal peptide for secretion and had RNA-seq evidence for transcription in iL3 and/or adult worms. Of those, 623 corresponding cDNAs were successfully amplified, cloned, expressed and the protein extracts were contact-printed without purification onto nitrocellulose glass FAST slides (Supplementary Note). The printed in vitro–expressed proteins were quality-checked using antibodies against incorporated N-terminal polyhistidine (His) and C-terminal hemagglutinin (HA) tags.

Protein arrays were blocked in blocking solution (Whatman) and probed with human sera overnight. Arrays were washed, and isotype- and subclass-specific responses were detected using biotinylated mouse monoclonal antibodies against human IgG1 (Sigma, B6775, lot 031M4751, clone 8c/6-39), IgG3 (Sigma, B3523, lot 080M4811, clone HP-6050) and IgG4 (Sigma, B3648, lot 091M4783, clone HP-6025) and biotin-conjugated mouse monoclonal anti-human IgE Fc (Human Reagent Laboratory, Baltimore, MD, HP6061B). Microarrays were scanned using a GenePix microarray scanner (Molecular Devices). The data were analyzed using the “group average” method103, whereby the mean fluorescence was considered for analysis (Supplementary Note).

URLs.

NCBI SRA, http://www.ncbi.nlm.nih.gov/sra; RepeatModeler, http://www.repeatmasker.org/RepeatModeler.html; RNAmmer, http://www.cbs.dtu.dk/services/RNAmmer/; Rfam database, http://www.sanger.ac.uk/resources/databases/rfam.html; RepeatMasker, http://repeatmasker.org/; Fgenesh, www.softberry.com/; BER, http://ber.sourceforge.net/; Seqclean, http://compbio.dfci.harvard.edu/tgi/software/; Refcov, http://gmt.genome.wustl.edu/genome-shipit/gmt-refcov/current/; PyMOL, www.pymol.org/; KEGG transcription factor database, http://www.genome.jp/kegg-bin/get_htext?ko03000.keg; Jaspar database, http://jaspar.genereg.net/; Patser, http://stormo.wustl.edu/resources.html; Kinomer, http://www.compbio.dundee.ac.uk/kinomer; SignalP, www.cbs.dtu.dk/services/SignalP/.

Accession codes.

The whole-genome sequence of N. americanus has been deposited in DDBJ/EMBL/GenBank under the project accession ANCG00000000. The version described in this paper is the first version, ANCG01000000. All short-read data have been deposited in the Short Read Archive under the following accessions: SRR036799, SRR036800, SRR036802, SRR036804, SRR036811, SRR341459, SRR341460, SRR609850, SRR609895, SRR609951, SRR610281, SRR610282, SRR611341, SRR611350. RNA-seq profiles have been deposited in Nematode.net and a browsable genome is also available at Nematode.net and WormBase.