Loa loa, the African eyeworm, is a major filarial pathogen of humans. Unlike most filariae, L. loa does not contain the obligate intracellular Wolbachia endosymbiont. We describe the 91.4-Mb genome of L. loa and that of the related filarial parasite Wuchereria bancrofti and predict 14,907 L. loa genes on the basis of microfilarial RNA sequencing. By comparing these genomes to that of another filarial parasite, Brugia malayi, and to those of several other nematodes, we demonstrate synteny among filariae but not with nonparasitic nematodes. The L. loa genome encodes many immunologically relevant genes, as well as protein kinases targeted by drugs currently approved for use in humans. Despite lacking Wolbachia, L. loa shows no new metabolic synthesis or transport capabilities compared to other filariae. These results suggest that the role of Wolbachia in filarial biology is more subtle than previously thought and reveal marked differences between parasitic and nonparasitic nematodes.
At a glance
Filarial nematodes dwell within the lymphatics and subcutaneous tissues of up to 170 million people worldwide and are responsible for notable morbidity, disability and socioeconomic loss1. Although eight filarial species infect humans, only five cause substantial pathology: W. bancrofti, B. malayi and Brugia timori, the causative agents of lymphatic filariasis; Onchocerca volvulus, the causative agent of 'river blindness' or onchocerciasis; and L. loa, the African eyeworm. L. loa affects an estimated 13 million people and causes chronic infection usually characterized by localized angioedema (Calabar swelling) and/or subconjunctival migration of adult worms across the eye ('African eyeworm'). Complications of infection include encephalopathy, entrapment neuropathy, glomerulonephritis and endomyocardial fibrosis2. L. loa is restricted geographically to equatorial west and central Africa, where its deerfly vector (Chrysops spp.) breeds. L. loa microfilariae (L1) are acquired by flies from human blood and subsequently develop into infective larvae (L3) before being reintroduced into a human host during a second blood meal (Supplementary Fig. 1). Although L. loa is the least studied of the pathogenic filariae, it has gained prominence recently because of the severe adverse events (encephalopathy and death) associated with ivermectin treatment3 during mass drug administration campaigns in west and central Africa.
We targeted L. loa for genomic sequencing for two reasons. First, in contrast to other pathogenic filariae, L. loa lacks the α-proteobacterial endosymbiont Wolbachia. The obligate nature of Wolbachia symbiosis in W. bancrofti, B. malayi and O. volvulus has been inferred by studies in which antibiotics (for example, doxycycline) that target Wolbachia (but not the worm itself) have shown efficacy in treating humans with these infections4, 5. Through genomic analysis, Wolbachia have been hypothesized to provide essential metabolic supplementation to their filarial hosts6, 7. The absence of the Wolbachia endosymbiont in L. loa suggests that either there has been lateral transfer of important bacterially encoded genes or the obligate relationship between the endosymbiont and its filarial host is dispensable, at least under certain circumstances. Understanding the comparable adaptations of L. loa is considered essential to gain insight into the potential impact of the endosymbiont8. Second, as the most neglected of the pathogenic filariae, but one gaining clinical prominence, understanding the host-parasite relationship as it relates to the severe post-treatment reactions typical of both Wolbachia-containing and Wolbachia-free filarial parasites is of paramount importance.
Thus, we generated a draft genome sequence of L. loa and produced a refined gene annotation aided by transcriptional data from L. loa microfilariae. We also generated draft genome sequences of two of the most pathogenic (and Wolbachia-containing) filarial species, W. bancrofti and O. volvulus. This approach enabled us to more comprehensively define the genomic differences between L. loa and other filarial parasites.
Genome assemblies and repeat content
The nuclear genome of L. loa consists of five autosomes plus a sex chromosome. Using 454 whole-genome shotgun sequencing, we sequenced L. loa to 20× coverage and assembled it into 5,774 scaffolds with an N50 of 172 kb and total size of 91.4 Mb (Table 1). We sequenced the W. bancrofti and O. volvulus genomes derived from single adult worms (an unsexed juvenile adult worm for W. bancrofti and an adult male worm for O. volvulus) to 12× and 5× coverage, respectively (Table 1). Because of the low coverage of the O. volvulus genome, we did not include it in further analyses. Although the assembly sizes of the L. loa and B. malayi genomes are comparable (91.4 Mb and 93.7 Mb, respectively), the scaffold N50 of the L. loa genome is almost twice that of the B. malayi genome, making the L. loa genome assembly the most contiguous of any filarial nematode so far. The filarial genomes differ widely in repeat content (Table 1, Supplementary Tables 1–14 and Supplementary Note), with the L. loa genome being more repetitive than that of W. bancrofti but less repetitive than that of B. malayi.
As nuclear Wolbachia transfers (nuwts) have been identified in all Wolbachia-colonized and Wolbachia-free filarial nematodes examined so far9, we expected to find similar transfers in the L. loa genome. However, a BLAST-based search of the assembled L. loa genome did not reveal any large transfers of Wolbachia DNA. A more sensitive read-based analysis determined that the L. loa genome does not have any large (>500 bp), recent transfers (Supplementary Note). It does however have small, presumably older transfers, supporting the hypothesis that L. loa was once colonized by Wolbachia but subsequently lost its endosymbiont (Supplementary Table 15 and Supplementary Fig. 2). Of the transfers that are definitively of Wolbachia ancestry and not of possible mitochondrial ancestry, there is no evidence that they are functional in L. loa (Supplementary Note).
Gene content and synteny
We produced initial gene sets for both L. loa and W. bancrofti using a combination of gene predictors with refinements to the L. loa annotation on the basis of RNA sequencing (RNA-Seq) data (Online Methods). The final L. loa gene set contained 14,907 genes, 70% of which were supported by RNA-Seq (Table 1 and Supplementary Tables 16 and 17). The W. bancrofti genome is predicted to encode 19,327 genes (Table 1 and Supplementary Note). The filarial genomes showed a high degree of synteny (Fig. 1), with 40% and 13% of L. loa genes being syntenic relative to B. malayi and W. bancrofti, respectively. Nearly all the syntenic breaks between filarial genomes occurred at scaffold ends (Supplementary Fig. 3b), suggesting that the synteny percentage detected was limited by assembly contiguity and the true level of synteny is much higher. When we compared the L. loa genome to that of Caenorhabditis elegans, orthologs from a single L. loa scaffold mapped predominantly to a single C. elegans chromosome (Fig. 1). However, only 2% of all L. loa genes were syntenic relative to C. elegans (Supplementary Fig. 3a), supporting the hypothesis that genome rearrangements during filarial evolution were mostly intrachromosomal7. There was an intermediate level of synteny (12%) between L. loa and the related nonfilarial parasite Ascaris suum (Supplementary Fig. 3a).
We were able to assign more than half of the genes encoded by the L. loa and W. bancrofti genomes to functional categories, Pfam domains, Gene Ontology (GO) terms and/or Enzyme Commission (EC) numbers (Supplementary Fig. 4 and Supplementary Tables 16 and 18). Relative to other filarial genomes, the L. loa genome is enriched (P < 0.05, Fisher's exact test) for numerous domains, including that containing pyridoxamine 5′-phosphate oxidases that synthesize vitamin B (Fig. 2). The L. loa genome is also enriched for numerous chemoreceptors, suggesting that L. loa may be capable of more complex interactions with its host environment than are other filarial worms (Supplementary Note). An RNA helicase domain involved in viral DNA replication is enriched in the L. loa genome; this domain was probably horizontally transferred to the L. loa genome from cyclovirus infection (Supplementary Note). Although not statistically significant, the L. loa genome encodes more hyaluronidases (six) than the B. malayi or W. bancrofti genomes (two each). Hyaluronidases are often involved in tissue penetration and could allow L. loa to move more readily through human host tissue, as L. loa adults are highly mobile, whereas B. malayi and W. bancrofti adults are commonly tethered to the lymphatic endothelium.
The genome of W. bancrofti is enriched (P < 0.05, Fisher's exact test) for genes with domains related to cellular adhesion and the extracellular matrix (for example, cadherins, laminins and fibronectins). Whether these are important in mediating the fibrosis associated with lymphatic filarial disease (for example, elephantiasis or lymphedema) in W. bancrofti infection10 or with establishing an anatomical niche within the afferent lymphatics where the adults reside awaits clarification.
Gene products associated with immunologic responses
Each filarial parasite interacts with both its definitive mammalian host and its intermediate arthropod host (Chrysops spp. in the case of L. loa) during its life cycle (Supplementary Fig. 1). The parasite is thought not only to have its own innate immune system to protect itself from microbial pathogens but also to have evolved mechanisms to exploit and/or subvert host and vector defense mechanisms. Although adaptive immune molecules such as immunoglobulins or Toll-like receptors (TLRs) are absent in L. loa and other filarial nematodes, L. loa, similarly to other filariae, seems to have a primordial Toll-related pathway (Supplementary Table 19 and Supplementary Note). The innate immune system encoded by the L. loa genome also includes C-type lectins, galectins, jacalins and scavenger receptors. L. loa contains a number of lipopolysaccharide binding proteins that have been implicated in modulating the effects of host bacteria or microbial translocation products. Similarly to B. malayi, the L. loa and W. bancrofti genomes do not encode antibacterial peptides described in C. elegans and A. suum7, suggesting that these molecules are either dispensable in filariae or too divergent to detect.
Analysis of L. loa genes identified a number of human cytokine and chemokine mimics and/or antagonists, including genes encoding macrophage migration inhibition factor (MIF) family signaling molecules, transforming growth factor-β and their receptors, members of the interleukin-16 (IL-16) family, an IL-5 receptor antagonist, an interferon regulatory factor, a homolog of suppressor of cytokine signaling 7 (SOCS7) and two members of the chemokine-like family (Supplementary Table 19). In addition, the L. loa genome encodes 17 serpins and 7 cystatins, which have been shown to interfere with antigen processing and presentation to T cells11, 2 indoleamine 2,3-dioxygenase (IDO) genes, which encode immunomodulatory proteins implicated in strategies of immune subversion, and a number of members of the Wnt family of developmental regulators, which typically modulate immune activation. The L. loa genome encodes proteins that have sequences similar to those of human autoantigens (Supplementary Note). Although some of these putative autoantigens can also be found in the other filariae, the slight expansion of them in L. loa suggests that antibodies induced by L. loa infection may be more autoreactive than those induced by other parasites.
In addition to elucidating host-pathogen interactions, pathogen genomes can be evaluated for potential drug targets, such as protein kinases. We therefore annotated protein kinases in the L. loa genome and compared them to those in other nematode genomes (Supplementary Tables 20–23 and Supplementary Fig. 5). We found numerous differences between filarial and nonparasitic nematode kinases, particularly regarding those involved in meiosis. The widely conserved TTK kinase (MPS1), which has a key role in eukaryotic meiosis12, is present in L. loa and absent in C. elegans. By contrast, filarial nematodes lack the nearly universally conserved RAD53-family kinase CHK-2, which is present in C. elegans. In most eukaryotes, RAD53 is involved in initiating cell-cycle arrest when DNA damage is detected, but in C. elegans it is essential for chromosome synapsis and nuclear rearrangement during meiosis13. This reciprocal difference suggests that meiosis in filarial parasites may be regulated in a manner more similar to that in typical eukaryotes than in C. elegans (Supplementary Note). Six L. loa protein kinases are orthologous to targets of drugs currently approved for use in humans (Supplementary Table 23), including the tyrosine kinase inhibitor imatinib, which has been shown to kill schistosomes14 and Brugia parasites of all stages at concentrations ranging from 5 to 50 μM (T.B.N., unpublished data). Therefore, repurposing already approved drugs that target these kinases may be promising in treating filarial (and other helminth) infections15.
To examine the evolution of filarial parasites in the context of other nematodes, we estimated a phylogeny from 921 single-copy core orthologs across nine nematode genomes using maximum likelihood, parsimony and Bayesian methods. All methods converged on a single topology with 100% support (either bootstrap values or posterior probabilities) at all nodes (Fig. 3). This phylogeny indicates that Meloidogyne hapla occupies a position basal to a clade of Rhabditina (C. elegans, Caenorhabditis briggsae and Pristionchus pacificus) and the Spirurina (filarial worms and A. suum). Although these results contrast with previous studies based on ribosomal subunits that placed M. hapla closer to Rhabditina than to the filarial worms16, 17, our analysis used a larger gene set and had higher nodal support values.
Relative to the genomes of nonparasitic nematodes, we identified numerous orthologs as being unique to the filarial parasites (Fig. 3). Proteins encoded by the filarial genomes showed enrichment of immunogenic domains such as extracellular and cell-adhesion domains and in a metabolic context were enriched for trehalase domains involved in trehalose degradation (q < 0.05, Fisher's exact test; Supplementary Fig. 6). Trehalose is known to be involved in the protection of nematodes from environmental stress18 and could potentially have a key role in filarial survival. Trehalose and its biosynthetic pathway have been shown to be associated with increased lifespan in C. elegans19 and might support the idea that increased use of trehalose by filarial nematodes could be related to their relatively long lifespan.
The filarial genomes lack a wide array of seven-transmembrane G protein–coupled chemoreceptors (7TM GPCRs; Supplementary Fig. 6). Profiling of 7TM GPCRs revealed a pattern of progressive loss of many families in the transition from nonparasitic to parasitic lifestyles (Fig. 4). For example, filarial nematodes and Trichinella spiralis completely lack the STR superfamily, including ODR-10, which is known to be involved in detection of volatiles20, and KIN-29, a protein kinase that regulates STR expression in C. elegans21. If the STR superfamily is more broadly involved in odorant detection, this could explain why these molecules are lacking in filarial nematodes and T. spiralis parasites that live only in aqueous environments, whereas they are retained in A. suum and M. hapla, which are exposed to volatiles in part of their life cycle. Only the SRAB, SRX, SRSX and SRW families were conserved across all nematodes, suggesting that these 7TM GPCRs mediate vital nematode functions.
Filarial genomes are also depleted in both soluble and receptor guanylate cyclases; these cyclases are involved in the regulation of environmental sensing and complex sensory integration functions (Fig. 4). However, GCY-35 and GCY-36, which are involved in the detection of molecular oxygen in solution22, are encoded in the filarial genomes. Protein kinase profiling revealed 18 receptor guanylate cyclases that are present in C. elegans but not in filarial worms, including the environmental sensors GCY-14 and GCY-22 (Supplementary Table 23). Depletion of these and other kinases involved in olfactory and gustatory sensing, including KIN-29, suggests that the environments of filarial nematodes are less complex in terms of chemosensory inputs than are those inhabited by nonparasitic nematodes (Supplementary Note). The L. loa genome does, however, encode significantly more chemoreceptors than do other filarial nematodes (P < 0.05, Fisher's exact test), which may be related to the increased mobility of L. loa adult worms.
Phylogenetic profiling of metabolism
Previous genomic analysis identified five biosynthetic pathways (heme, riboflavin, FAD, glutathione and nucleotide synthesis) present in Wolbachia but missing from its relatives, for example, Rickettsia. These Wolbachia-encoded pathways were hypothesized to provide metabolites needed by their filarial hosts6. As L. loa lacks Wolbachia, it was theorized that the L. loa genome must encode genes to replace these pathways, potentially laterally transferred from Wolbachia to an ancestor of L. loa. However, no transfers relating to these metabolic functions were apparent. Thus, we generated complete metabolic pathway reconstructions for nine nematode and four Wolbachia genomes (Table 2 and Supplementary Tables 24 and 25) to determine how L. loa acquires these metabolites and placed the results in an evolutionary context. None of the five 'complementary' pathways differed between L. loa and the other filarial nematodes, calling into question the role of these pathways in filarial-Wolbachia symbiosis.
Furthermore, in only two pathways (heme and nucleotide synthesis) did the filarial genomes differ from those of the other nematodes. The FAD and glutathione pathways are complete in all nematode genomes, whereas the riboflavin pathway is missing from all nematode genomes. The heme biosynthesis pathway, previously reported to be absent in B. malayi7, is missing from not only the filarial worms but also all nematode genomes characterized so far. Experimental work on C. elegans (which is also Wolbachia free) has shown that it cannot synthesize heme de novo23. B. malayi has been previously noted as having a single member of the heme synthesis pathway, ferrochelatase (an enzyme that catalyzes the last step in heme synthesis7; Supplementary Note). The gene encoding ferrochelatase is also present in the L. loa and W. bancrofti genomes but is absent in all other nematode genomes, including that of A. suum. It is possible that this gene in filarial nematodes is not involved in heme synthesis but rather in an alternate, unknown pathway.
Similarly to B. malayi, both L. loa and W. bancrofti lack the ability to synthesize nucleotides de novo. All three filarial genomes lack the majority of the proteins involved in the purine synthesis pathway, as well as the first enzyme involved in the pyrimidine synthesis pathway (Table 2 and Supplementary Table 24). Other nematodes have also lost portions of these pathways; the purine synthesis pathway has been largely lost in P. pacificus and M. hapla, whereas the first two enzymes in the pyrimidine synthesis pathway have been lost in T. spiralis. These multiple and probably independent losses could underscore a general flexibility in the need for de novo nucleotide synthesis in nematodes. All nematodes, including the filariae, have complete sets of purine and pyrimidine interconversion pathways (Supplementary Table 24), implying that they could generate all necessary nucleotides from a single purine and pyrimidine source, a concept supported by experimental data in B. malayi24. Filarial genomes encode two purine-specific 5′ nucleotidases for salvage, whereas all other nematodes encode only one; the extra copy in the filariae seems to have arisen from a single gene duplication event and diverged markedly from the ancestral gene (Supplementary Fig. 7). Additionally, we profiled known nematode and Wolbachia transporters linked to these pathways and found no evidence of differences between filarial and nonfilarial nematodes or among Wolbachia endosymbionts (Supplementary Note and Supplementary Fig. 8). Given the uniformity of these pathways across nematodes and the apparent lack of any related transfers of Wolbachia DNA to the L. loa genome, it is probable that the symbiotic role of Wolbachia in filarial nematodes either lies outside these pathways or involves more subtle metabolic supplementation rather than the wholesale provision of unproduced metabolites.
The only metabolic pathway found to differ in gene content between L. loa and other nematodes with sequenced genomes is vitamin B6 synthesis and salvage. Most nematode genomes encode single copies of the two enzymes involved in vitamin B6 salvage, but the L. loa genome encodes five copies of the second enzyme, pyridoxal 5′-phosphate synthase (Supplementary Note). This pathway also differed among Wolbachia genomes. Although both of the insect Wolbachia genomes also encoded two genes involved in the synthesis of vitamin B6 (pdxJ and pdxK), neither of the filarial Wolbachia genomes did (the difference between Wolbachia of B. malayi and Wolbachia of Drosophila melanogaster was noted previously6). If the filarial Wolbachia endosymbionts need to acquire vitamin B6 exogenously, this could explain a metabolic need of Wolbachia that is fulfilled by the nematode. However, with that hypothesis in mind, it is unclear why L. loa, the one pathogenic filarial nematode without Wolbachia, would encode a greater number of vitamin B6 salvage genes than either B. malayi or W. bancrofti. We could not exclude differences in pyridoxine transporters, as we could identify no orthologs of known transporters in either nematode or Wolbachia genomes (Supplementary Note).
The study of some nematode genomes has already provided great insight into the genomic structure, biology and evolution of this major division of nematode parasites. With the release of the genome of L. loa, a human pathogen and parasitic nematode that does not contain Wolbachia, we have been able provide insights into the dispensability of this endosymbiont that deepen the mystery surrounding the 'essential nature' of Wolbachia for many filarial worms.
Through large-scale genomic comparisons within the phylum Nematoda, we have not only been able to define molecules and pathways that are either L. loa–specific or filaria-specific but also, by comparison with nonparasitic nematodes (for example, C. elegans), gained a glimpse into the nature of parasitism itself. Moreover, this effort has identified new targets for intervention that should aid programs aimed at the control and elimination of these important but neglected parasites.
Sequencing and assembly.
For L. loa, 5 × 105 microfilariae were purified during a therapeutic apheresis from a patient with loiasis infected in Cameroon seen at the NIH under protocol 88-I-83 (NCT00001230). A single unfertilized adult W. bancrofti worm was obtained under ultrasonic guidance (as part of protocol NCT00339417) in Tieneguebougou, Mali. A single adult O. volvulus male was isolated from a surgically removed subcutaneous nodule in Ecuador after collagenase digestion. Genomic DNA for all samples was prepared using the Qiagen genomic DNA kit (Qiagen, Gaithersburg, MD). DNA obtained from W. bancrofti and O. volvulus was amplified using the Qiagen Repli-g Midi Kit. For L. loa, W. bancrofti and O. volvulus, approximately 50, 10 and 5 μg of DNA, respectively, was used for genomic sequencing. For L. loa, 454 shotgun fragment and 3-kb jumping sequencing libraries were prepared and sequenced as previously described25. Only fragment libraries were constructed for W. bancrofti and O. volvulus. Assemblies were then generated using Newbler version 2.1 (Roche 454 Life Sciences). Given the overall low coverage of the W. bancrofti and O. volvulus assemblies (5×–12×), no bias normalization was done for the whole-genome amplified sequence data. Also for the W. bancrofti and O. volvulus assemblies, contigs were screened by BLASTing against GenBank's nonredundant nucleotide database (NT) using a cutoff of 1 × 10−25 and minimum match length of 100 bp, and all contigs where the top match was to Wolbachia were removed. Any contigs remaining in the nematode assembly that had secondary matches to Wolbachia were screened manually to ensure that no large chimeric contigs had been generated and retained. Unassembled reads were also screened for the Wolbachia sequence using the same BLAST parameters and database. Unassembled reads identified as Wolbachia, along with reads underlying the contigs identified as Wolbachia, were assembled together using Newbler version 2.1 to generate the Wolbachia genome assemblies.
Repeat content analysis.
Repeat content was identified using RepeatScout26 followed by RepeatMasker using both nematode repeats from RepBase v17.06 (ref. 27) and the output from RepeatScout. Only hits with a Smith-Waterman score >250 were maintained. Additional repeats were then identified on the basis of abnormally high read coverage in the genome assemblies using genome sequence scanning with hysteresis triggering. Positions with read depth 20 times the mode of the read depth distribution switched the 'collapsed reads' state to on during the scanning process, and positions with read depth lower than 10 times the mode switched the 'collapsed reads' state to off. Only regions longer than 100 nucleotides were reported. Read mapping was performed by runMapping application of the Newbler suite28. The output was converted to SAM file format by the seq.Newbler2SAM option of the GLU package. Only the best alignment of each read was kept. Read depth was calculated by the genomeCoverageBed program of BEDTools suite29.
RNA was prepared from one million L. loa microfilariae purified from the blood of a patient. Under liquid nitrogen, the microfilariae were disrupted by a stainless steel piston apparatus. Total RNA was extracted using the RNeasy Kit (Qiagen, Valencia, CA, USA). A non–strand specific complementary (cDNA) library for Illumina paired-end sequencing was prepared from ~37 ng of total RNA as previously described30 with the following modifications. RNA was treated with Turbo DNase (Ambion, TX) and fragmented by heating at 80 °C for 3 min in 1× fragmentation buffer (Affymetrix, CA) before cDNA synthesis. Sequencing adaptor ligation was performed using 4,000 units of T4 DNA ligase (New England Biolabs, MA) at 16 °C overnight. After adaptor ligation, the resulting library was cleaned, size selected twice using 0.7× volumes of Ampure beads (Beckman Coulter Genomics, MA), enriched using 18 cycles of PCR and cleaned using 0.7× volumes of Ampure beads (Beckman Coulter Genomics, MA). The resulting Illumina sequencing library was sequenced with 76 base paired-end reads on an Illumina GAII instrument (v1.8 analysis pipeline) following the manufacturer's recommendations (Illumina, CA).
Identification of transfers (nuwts).
An initial search of the Wolbachia of B. malayi genome against the L. loa genome was done using BLASTN with a cutoff of 1 × 10−5. After this assembly-based search, nuclear Wolbachia transfers (nuwts) were identified through a screen of the L. loa sequencing reads as being >80% identical to Wolbachia sequences over 50% of the read. Searches were refined to examine reads with >50 bp match to Wolbachia and were manually curated to remove spurious matches that had a nematode ancestry. Reads matching the bacterial ribosomal RNA (rRNA) were removed, as they could arise from any bacterial genome that might be contaminating the sample. Regions of homology <50 bp were included if they were detected through analysis of an adjacent region with homology over >50 bp. All of the reads containing nuwts were mapped back to the L. loa genome to identify the consensus sequence, and the relationship was confirmed using BLASTN to NT. Phylogenetic analysis was conducted on nucleotide sequences of predicted nuwts using RAxML30.
Genes for both L. loa and W. bancrofti were predicted using a combination of ab initio gene prediction tools as previously described31. We also used TBLASTN to search the genome assembly against protein sequences of the following species: C. elegans, C. briggsae, Schistosoma mansoni, Schistosoma japonicum and B. malayi (downloaded from GenBank on February 16, 2010). The top BLAST hits are used to construct GeneWise32 gene models. In addition, we generated gene models using available EST data from L. Loa, W. bancrofti, O. volvulus and B. malayi (downloaded from GenBank on December 2, 2009). All of these models were used as input into EVM33 to generate combined gene predictions. To incorporate the L. loa RNA-Seq data, we aligned all RNA-Seq reads to the L. loa genome using BLAT34. Next we use the Inchworm module of the Trinity package35 with default settings in genome-guided mode to assemble the reads into EST-like transcripts. These transcripts, along with the models from EVM into PASA33, were used for gene model improvement. Gene sets were subsequently filtered to remove repeats, including genes overlapping rRNA, transfer RNA (tRNA) or output from RepeatScout26 or TransposonPSI. Every annotated gene was given a locus identification of the form LOAG_##### (L. loa) or WUBG_##### (W. bancrofti). Pfam domains within each gene were identified using Hmmer3 (ref. 36), and gene ontology terms were assigned using BLAST2GO37. Secretion signals and transmembrane domains were identified using SignalP 4.0 (ref. 38) and TmHmm39, respectively. Core eukaryotic genes were identified using CEGMA40.
Identification of fragmented genes.
Fragmented W. bancrofti genes were associated to their putative intact orthologs in L. loa or B. malayi by unidirectional BLAST of W. bancrofti peptides against peptides from the reference genome (L. loa or B. malayi). W. bancrofti proteins with <80% similarity to the reference, on the basis of query length, and an E value >1 × 10−10 were disregarded. A gene was considered fragmented if its length in W. bancrofti was at least 50% shorter than its length in its respective ortholog. The number of reference genome orthologs with multiple assigned fragments in W. bancrofti was then used to extrapolate a corrected gene count for W. bancrofti. An identical analysis was done for L. loa genes by comparison to B. malayi.
Whole-genome alignments of C. elegans, B. malayi and W. bancrofti against L. loa were performed by progressive Mauve41 with default parameters. The extent of the alignment between a pair of sequences was defined as the length spanning all their respective colinear blocks. For each comparison, chromosomes or scaffolds having the longest alignment against L. loa scaffold number 4 (100 scaffolds from W. bancrofti and 30 scaffolds from B. malayi and C. elegans chromosome 3) were selected for visualization. For the systematic evaluation of synteny, pairwise syntenic blocks between the genomes of L. loa, C. elegans, B. malayi and W. bancrofti were defined by DAGchainer42 with the minimum number of colinear genes set to three.
Gene clustering and phylogenetic analysis.
We built a comparative set of genomes including those sequenced in this study and those of P. pacificus (from www.pristionchus.org), C. elegans (release 224 from WormBase), C. briggsae (CAAC00000000), M. hapla (from www.pngg.org), B. malayi (release 230 from WormBase), A. suum (published release from WormBase) and T. spiralis (ABIR00000000). Genes were clustered using OrthoMCL with a Markov inflation index of 1.5 and a maximum E value of 1 × 10−5 (ref. 43). Amino acid sequences of orthologs present as single copies in all genomes were aligned using MUSCLE44 and concatenated. We then estimated phylogenies from this data set using three methods. Parsimony bootstrapping analysis was conducted with PAUP45 using unweighted characters and 1,000 bootstrap replicates. For maximum likelihood analysis, we first selected the JG model46 using ModelGenerator47 and then used the PROTCATJG model in RAxML30 with 1,000 bootstrap replicates. For Bayesian analysis, we used MrBayes48 with a mixed amino acid model and gamma-distributed rates. We ran the analysis with one chain for 1 million generations, sampling every 500 generations and discarding the first 25% of samples as burn in. Enrichment analyses were conducted using Fisher's exact test, and multiple comparisons were corrected using the false discovery rate49.
Initial sets of protein kinases were identified by orthology with annotated C. elegans kinases. Kinases without orthologs were identified in a search of the proteome against a protein kinase hidden Markov model derived from an alignment of Dictyostelium protein kinases50 using a cutoff score of –66. Low-scoring sequences were additionally screened for conservation of known protein kinase sequence motifs. All protein kinases were classified using a controlled vocabulary51, 52, and classifications of filarial kinases with C. elegans orthologs were mapped from the curated set from the KinBase database. Kinases without orthologs in C. elegans were searched against the curated set using BLAST and classified if the top three hits agreed. Orthology across all nematodes was then used to identify potentially missed kinases and ensure consistent classification.
In addition to the nine nematode genomes listed above, we used three additional Wolbachia genomes from B. malayi (AE017321), D. melanogaster (AE017196) and C. pipiens (AM999887). Metabolic pathways were characterized using Pathway Tools53. Metabolic reconstruction was performed using EFICAz2 (ref. 54) to assign Enzyme Commission numbers for each enzyme. Enzyme Commission numbers and gene names were used as input to the Pathologic software55 with transport-identification-parser and pathway-hole-filler options set to assign MetaCyc56 pathways for each organism. The full set of metabolic pathways for each genome is available at the WormCyc database.
Repeat masker, http://www.repeatmasker.org; GLU package, http://code.google.com/p/glu-genetics; TransposonPSI, http://transposonpsi.sourceforge.net; Pristionchus database, http://www.pristionchus.org; WormBase, http://www.wormbase.org; KinBase database, http://www.kinase.com; WormCyc database, http://wormcyc.broadinstitute.org.
All genome assemblies are available in GenBank under the following BioProject identifiers and accession numbers, respectively: L. loa (PRJNA37757 and ADBU02000000), W. bancrofti (PRJNA37759 and ADBV01000000), O. volvulus (PRJNA37761 and ADBW01000000), Wolbachia of W. bancrofti (PRJNA43539 and ADHD00000000) and Wolbachia of O. volvulus (PRJNA43537 and ADHE00000000).
- Control of neglected tropical diseases. N. Engl. J. Med. 357, 1018–1027 (2007). et al.
- Tropical Infectious Diseases: Principles, Pathogens and Practice (eds. Guerrant, R.L., Walker, D.H. & Weller, P.F.) 735–740 (Churchill Livingstone, 2011). & in
- Serious reactions after mass treatment of onchocerciasis with ivermectin in an area endemic for Loa loa infection. Lancet 350, 18–22 (1997). et al.
- Lymphatic filariasis and onchocerciasis. Lancet 376, 1175–1185 (2010). , &
- A randomized trial of doxycycline for Mansonella perstans infection. N. Engl. J. Med. 361, 1448–1458 (2009). et al.
- The Wolbachia genome of Brugia malayi: endosymbiont evolution within a human pathogenic nematode. PLoS Biol. 3, e121 (2005). et al.
- Draft genome of the filarial nematode parasite Brugia malayi. Science 317, 1756–1760 (2007). et al.
- The role of endosymbiotic Wolbachia bacteria in the pathogenesis of river blindness. Science 295, 1892–1895 (2002). et al.
- Horizontal gene transfer between bacteria and animals. Trends Genet. 27, 157–163 (2011).
- Altered circulating levels of matrix metalloproteinases and inhibitors associated with elevated type 2 cytokines in lymphatic filarial disease. PLoS Negl. Trop. Dis. 6, e1681 (2012). et al.
- Modulation of host immune responses by nematode cystatins. Int. J. Parasitol. 33, 1291–1302 (2003). &
- The multiple roles of mps1 in Drosophila female meiosis. PLoS Genet. 3, e113 (2007). et al.
- Checkpoints: chromosome pairing takes an unexpected twist. Curr. Biol. 11, R865–R868 (2001). &
- Imatinib has a fatal impact on morphology, pairing stability and survival of adult Schistosoma mansoni in vitro. Int. J. Parasitol. 40, 521–526 (2010). &
- Piggy-backing the concept of cancer drugs for schistosomiasis treatment: a tangible perspective? Trends Parasitol. 27, 59–66 (2011). &
- An improved molecular phylogeny of the Nematoda with special emphasis on marine taxa. Mol. Phylogenet. Evol. 42, 622–636 (2007). et al.
- A molecular evolutionary framework for the phylum Nematoda. Nature 392, 71–75 (1998). et al.
- Trehalose metabolism genes in Caenorhabditis elegans and filarial nematodes. Int. J. Parasitol. 33, 1195–1206 (2003). et al.
- Trehalose extends longevity in the nematode Caenorhabditis elegans. Aging Cell 9, 558–569 (2010). , &
- odr-10 encodes a seven transmembrane domain olfactory receptor required for responses to the odorant diacetyl. Cell 84, 899–909 (1996). , &
- The EGL-4 PKG acts with KIN-29 salt-inducible kinase and protein kinase A to regulate chemoreceptor gene expression and sensory behaviors in Caenorhabditis elegans. Genetics 180, 1475–1491 (2008). et al.
- Neurons detect increases and decreases in oxygen levels using distinct guanylate cyclases. Neuron 61, 865–879 (2009). et al.
- Lack of heme synthesis in a free-living eukaryote. Proc. Natl. Acad. Sci. USA 102, 4270–4275 (2005). , , &
- Exogenous nucleosides are required for the morphogenesis of the human filarial parasite Brugia malayi. J. Parasitol. 90, 1184–1185 (2004).
- A scalable, fully automated process for construction of sequence-ready barcoded libraries for 454. Genome Biol. 11, R15 (2010). et al.
- De novo identification of repeat families in large genomes. Bioinformatics 21 (suppl. 1), i351–i358 (2005). , &
- Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005). et al.
- Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005). et al.
- BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). &
- RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).
- Approaches to fungal genome annotation. Mycology 2, 118–141 (2011). , , , &
- GeneWise and Genomewise. Genome Res. 14, 988–995 (2004). , &
- Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008). et al.
- BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
- Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011). et al.
- Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
- Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005). et al.
- SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011). , , &
- Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001). , , &
- CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007). , &
- progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5, e11147 (2010). , &
- DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20, 3643–3646 (2004). , , &
- OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003). , &
- MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
- PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods), version 4.0b10. (Sinauer Associates, Sunderland, Massachusetts, 2003).
- An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320 (2008). &
- Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol. Biol. 6, 29 (2006). , , , &
- MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012). et al.
- Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003). &
- The dictyostelium kinome—analysis of the protein kinases from a simple model organism. PLoS Genet. 2, e38 (2006). et al.
- Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 9, 576–596 (1995). &
- The protein kinase complement of the human genome. Science 298, 1912–1934 (2002). , , , &
- Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief. Bioinform. 11, 40–79 (2010). et al.
- EFICAz2: enzyme function inference by a combined approach enhanced by machine learning. BMC Bioinformatics 10, 107 (2009). , &
- Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 (2005). et al.
- The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 40, D742–D753 (2012). et al.
We thank members of the Broad Institute Genomics Platform for sequencing and D. Neafsey for comments on the manuscript. This project has been funded in part by the National Institute of Allergy and Infectious Diseases, US National Institutes of Health (NIH), Department of Health and Human Services under contract number HHSN272200900018C and by the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, NIH. J.C.D.H. is funded by the NIH Director's New Innovator Award Program (1-DP2-OD007372).
- Supplementary Text and Figures (3 MB)
Supplementary Note, Supplementary Figures 1–8 and Supplementary Tables 1, 2, 6, 7, 11, 12, 15, 16, 20, 24 and 25
- Supplementary Table 3 (156 KB)
Novel L. loa repeats identified by RepeatScout.
- Supplementary Table 4 (57 KB)
Novel L. loa repeats identified by scanning the genome for regions with collapsed reads.
- Supplementary Table 5 (45 KB)
Low complexity regions in L. loa genome.
- Supplementary Table 8 (102 KB)
Novel W. bancrofti repeats identified by RepeatScout.
- Supplementary Table 9 (57 KB)
Novel W. bancrofti repeats identified by scanning the genome for regions with collapsed reads.
- Supplementary Table 10 (49 KB)
Low complexity regions in W. bancrofti genome.
- Supplementary Table 13 (213 KB)
Novel B. malayi repeats identified by RepeatScout.
- Supplementary Table 14 (45 KB)
Low complexity regions in B. malayi genome.
- Supplementary Table 17 (414 KB)
Expression levels of L. loa genes in microfilariae.
- Supplementary Table 18 (20 MB)
Comprehensive annotation of the L. loa predicted proteome.
- Supplementary Table 19 (49 KB)
Immunologically relevant genes in the L. loa genome.
- Supplementary Table 21 (197 KB)
Annotation of protein kinases in the L. loa genome.
- Supplementary Table 22 (94 KB)
Phylogenetic profiles of protein kinases in nematode genomes.
- Supplementary Table 23 (45 KB)
Nematode protein kinases from families with human drug targets.