The genome of a subterrestrial nematode reveals adaptations to heat

Weinstein, Deborah J.; Allen, Sarah E.; Lau, Maggie C. Y.; Erasmus, Mariana; Asalone, Kathryn C.; Walters-Conte, Kathryn; Deikus, Gintaras; Sebra, Robert; Borgonie, Gaetan; van Heerden, Esta; Onstott, Tullis C.; Bracht, John R.

doi:10.1038/s41467-019-13245-8

Download PDF

Article
Open access
Published: 21 November 2019

The genome of a subterrestrial nematode reveals adaptations to heat

Nature Communications volume 10, Article number: 5268 (2019) Cite this article

9726 Accesses
19 Citations
256 Altmetric
Metrics details

Subjects

Abstract

The nematode Halicephalobus mephisto was originally discovered inhabiting a deep terrestrial aquifer 1.3 km underground. H. mephisto can thrive under conditions of abiotic stress including heat and minimal oxygen, where it feeds on a community of both chemolithotrophic and heterotrophic prokaryotes in an unusual ecosystem isolated from the surface biosphere. Here we report the comprehensive genome and transcriptome of this organism, identifying a signature of adaptation: an expanded repertoire of 70 kilodalton heat-shock proteins (Hsp70) and avrRpt2 induced gene 1 (AIG1) proteins. The expanded Hsp70 genes are transcriptionally induced upon growth under heat stress, and we find that positive selection is detectable in several members of this family. We further show that AIG1 may have been acquired by horizontal gene transfer (HGT) from a rhizobial fungus. Over one-third of the genes of H. mephisto are novel, highlighting the divergence of this nematode from other sequenced organisms. This work sheds light on the genomic basis of heat tolerance in a complete subterrestrial eukaryotic genome.

Genome-wide signatures of adaptation to extreme environments in red algae

Article Open access 04 January 2023

Chung Hyun Cho, Seung In Park, … Hwan Su Yoon

Whole genome analysis and cold adaptation strategies of Pseudomonas sivasensis W-6 isolated from the Napahai plateau wetland

Article Open access 30 August 2023

Lingling Xiong, Yanmei Li, … Xiuling Ji

Borgs are giant genetic elements with potential to expand metabolic capacity

Article Open access 19 October 2022

Basem Al-Shayeb, Marie C. Schoelmerich, … Jillian F. Banfield

Introduction

Halicephalobus mephisto was discovered inhabiting a fluid-filled aquifer accessed from the Beatrix Gold Mine in South Africa at 1.3 km below the surface¹. Radiocarbon dating indicates the aquifer water is over 6000 years old¹, and the lack of surface ³H infiltration, a remnant of atmospheric atomic testing, highlights its isolation from the surface biosphere¹. The water is warm (37 °C), alkaline (pH 7.9), hypoxic (0.42–2.3 mg/L dissolved O_2,), and rich in biogenic methane (CH₄)^1,2,3. In spite of these challenging conditions, a thriving, complex microbial community exists in this extreme environment, including chemolithoautotrophic organisms that extract energy from the subterrestrial rock and fix inorganic carbon^2,4. Syntrophic relationships link sulfur-oxidizing denitrifying bacteria, sulfate reducers, methanogens, and anaerobic methane oxidizing organisms into a complex mutually reinforcing microbial food web² that supports a rich assemblage of eukaryotic opportunistic predators, including nematodes, rotifers, and protists⁵. With the exception of H. mephisto none of these eukaryotic organisms have been cultured in the laboratory, and none have had their genomes sequenced and analyzed until now.

Nematodes encode small, remarkably dynamic genomes well suited to studies of adaptation^6,7. Among the most abundant animals on earth, nematodes have adapted to an incredibly diverse set of environments: from hot springs to polar ice, soil, fresh, and saltwater⁸, acid seeps⁹, and the deep terrestrial subsurface¹, with a wealth of comparative genomic data available. Dynamic gene family expansion^6,10 and shrinkage¹¹ have proven good signatures of evolutionary adaptive selection in crown eukaryotes, including nematoda¹². Here, we perform comprehensive genomic and transcriptomic studies in H. mephisto, revealing the evolutionary adaptive response to a subterrestrial environment, including expanded gene families and patterns of expression under heat stress.

Results

Assembly, annotation, and phylogenetic analysis

De novo DNA sequence assembly with Illumina data and scaffolding with PacBio reads, yielded a small complete assembly of 61.4 Mb comprising 880 scaffolds with N50 of 313 kb, though 90% of the sequence is encoded on just 193 scaffolds. The longest scaffold is just under 2.55 Mb (Table 1). Several lines of evidence suggest that this is a highly complete genome. The Core Eukaryotic Genes Mapping Approach (CEGMA)^13,14 identified 240 of 248 core eukaryotic genes for a completeness of 97%, and tRNscan-SE¹⁵ identified 352 tRNAs encoding all 20 amino acids plus selenocysteine. Benchmarking Universal Single Copy Orthologs (BUSCO)¹⁶ estimated the completeness as 81.4%, but manual inspection of its output shows that 79 apparently missing genes are actually detected, with good e-values (median 6e-19). Given that BUSCO’s thresholds are established from eight nematode genes from Clades I, III, and V¹⁷, it may not be well suited to divergent Clade IV nematodes such as H. mephisto, since it also scored the P. redivivus genome (98% complete¹⁸) as only 82.1% complete. Therefore, considering these divergent matches to be valid, we conclude that for H. mephisto BUSCO detected 946/982 orthologous genes, for a completeness of 96%, consistent with CEGMA. Reinforcing the completeness of the H. mephisto assembly and annotation, a quantitative comparison of 3,252 protein domains shows a strong correlation with Caenorhabditis elegans (Fig. 1b).

Table 1 Genome and transcriptome data for H. mephisto and comparison to C. elegans, M. hapla, and P. redivivus

Full size table

The repetitive component of the H. mephisto genome is highly divergent. Using a custom RepeatModeler repeat library, RepeatMasker masked 24.3% of the genome, denoting 21.1% as interspersed repeats, 87.3% of which are unknown (Table 2). Evaluation of these sequences with nhmmer¹⁹ and the DFAM database positively identified 44.3% of these as helitrons, with 8.8% retrotransposons and 23.6% DNA transposons (Table 2). However, consistent with the genomic divergence of H. mephisto, this repetitive element repertoire appears extremely different from known elements, including many unique or novel repeat families needing further characterization, and a significant 23.3% remain unclassified by either algorithm (Table 2).

Table 2 Repetitive element composition of the H. mephisto genome

Full size table

The H. mephisto nuclear genome was annotated with Maker2²⁰, TopHat²¹, StringTie²², and TransDecoder²³, and transcripts were clustered with gffcompare to define a total of 16,186 protein-coding loci. An additional 1,023 loci were identified that do not encode proteins over 50 amino acids and are candidate noncoding transcripts, for a total of n = 17,209 transcribed loci in the H. mephisto genome, producing 34,605 transcripts for an average of 2.0 transcripts per locus (Table 1). Intron lengths averaged 473 compared to 320 bp for C. elegans (Table 1).

We used single-copy orthologous proteins (SCOGS) to build a phylogenetic tree placing H. mephisto as a distant relative of free-dwelling Panagrellus redivivus, the nearest fully sequenced nematode relative, within Clade IV (Fig. 1a). The comparison of domain counts between C. elegans and H. mephisto uncovered two domains strikingly enriched in H. mephisto: avrRpt2-induced gene, AIG1 (84 domains vs. 0) and 70-kD heat-shock protein, Hsp70 (126 domains vs. 15) (Fig. 1b). The most over-represented domains in C. elegans are the Meprin And TRAF-Homology (MATH) domain, Fog-2 Homology Domain (FTH) and F-box associated (FBA_2) domains, all apparent lineage-specific expansions in Caenorhabditis^24,25 (Fig. 1b). We used OrthoVenn²⁶ to identify orthologous genes between D. melanogaster, C. elegans, P. redivivus, and the 16,186 H. mephisto proteins, identifying a set of 5,397 shared among all nematodes, with 3,233 shared among all four invertebrates. Among the 417 H. mephisto-specific orthologous groups, the largest cluster was Hsp70, with 107 proteins.

Hsp70 expansion

The expansion of Hsp70 is particularly evocative because these well-studied heat-activated chaperones refold proteins denatured by heat^27,28,29,30 as part of a coordinated response to heat-shock^31,32,33,34. Even more intriguing, Hsp70 are expanded in organisms adapted to environmental thermal stress, but not in nematodes parasitic on endothermic hosts, suggesting the expansion of Hsp70 may be a general strategy for adaptation to environmental, not parasitic, heat (Fig. 1c). Bayesian phylogenetic analysis of Hsp70 proteins recovered known paralogs specific to cellular compartments including mitochondrial and endoplasmic reticulum (ER)³⁵ along with a cluster grouping human, mouse, and nematode genes (Cluster I, Fig. 2a). The recovered Hsp70 gene tree topology is robust, given that the same structure was recovered by maximum likelihood (Supplementary Fig. 1). The human sequences in Cluster I, include well-characterized Hsp70 sequences³⁶. Cluster II is a new 37-member H. mephisto-only group, which surprisingly is most closely related to another novel Diploscapter cluster with 59 genes (Cluster III, Fig. 2a). These data suggest that the Hsp70 gene family has undergone significant amplification within the Diploscapter and Halicephalobus lineages which, owing to their evolutionary distance (Fig. 1a), most likely did not inherit these expanded gene families from a common ancestor. Instead, we propose these genes underwent independent expansions in both lineages under shared evolutionary pressure to adaptat to heat stress: Diploscapter pachys is thermotolerant³⁷, while Diploscapter coronatus is a facultative parasite of humans³⁸ and a member of the genus has been found in thermal waters³⁹.

A signature of adaptive evolutionary change is positive selection, in which mutations altering amino acids (dN) are enriched relative to (presumably neutral) synonymous changes (dS), giving a dN/dS ratio (ω) greater than 1⁴⁰. A value of ω less than 1 indicates elimination of mutations that alter the amino acid sequence, also known as purifying selection, and occurs when protein function is preserved by evolution⁴⁰. Given the long branch lengths, gene family expansions, and potential for selection acting on Hsp70 gene function, we used PAML⁴¹ to test for statistical evidence of positive selection. To facilitate this analysis, we created a new Bayesian phylogeny using a subset of genes from H. mephisto, the Diploscapter species, and C. elegans outgroups, resulting in a well-resolved Bayesian phylogeny (Fig. 2b). We ran a branch-sites test in PAML, which estimates two ω parameters for different codons on a preselected foreground branch of the phylogenetic tree. The ω₁ parameter accounts for pervasive purifying selection at specific sites (codons) but a second ω₂ measures values greater or equal to 1 (from neutral to positive selection) at other sites. Because ω₂ is estimated from the data, it is an indicator of positive selection if it is reported to be above 1. Furthermore, by performing the test twice, once with ω₂ fixed at 1 (neutrality, a null model) and once allowing it to be freely estimated from the data, a likelihood ratio test (LRT) can be used to derive a p value quantifying the strength of positive selection on a particular branch⁴². In our analysis, positive selection was detected along the long branch leading to the H. mephisto cluster: ω₂ of 197 on 89% of amino acids, p = 0.04 (Table 3), however, it was not robust to Bonferroni correction for multiple hypothesis testing, which may be too conservative⁴³. Given that this branch has by far the most sites (89%) under positive selection, we have made it semi-bold (Fig. 2b). Values of ω₂ greater than 1 were detected on ten of eleven branches from clusters II and III (Table 3) with four giving a p value that is significant after correction for multiple hypothesis testing (lines strongly bolded in Fig. 2b). In particular, the root of the Diploscapter cluster (Fig. 2b, branch K) showed evidence of strong positive selection (Table 3). As controls, we tested a Cluster I short branch (L) and mitochondrial gene (M), which showed no evidence of positive selection (Fig. 2b, Table 3). Together these data suggest that positive selection is detectable in specific Hsp70 lineages in both Halicephalobus and Diploscapter.

Table 3 Branch-site analysis of dN/dS ratios (ω) from tree in Fig. 2b

Full size table

AIG1 expansion

AIG1 was originally identified as a pathogen response gene in plants⁴⁴, but is also involved in survival of T cells in mammals, where they were named immune-associated nucleotide binding proteins (IANs),⁴⁵ or GTPase of immunity-associated proteins (GIMAPs)⁴⁶. These proteins function as GTP-binding molecular switches controlling cell fates⁴⁶. AIG1 was originally reported to be completely absent from nematodes⁴⁵, but by relaxing the statistical stringency we find a single copy of the domain (Y67D2.4) in C. elegans annotated as a homolog of human mitochondrial ribosome associated GTPase 1 (MTG1), suggesting a possible divergence and expansion of the GTPase superfamily in H. mephisto and other nematodes. Consistent with this, blastp against the nr database and HMMER search of Uniprot reference proteomes identified hundreds of matches with low percent identity (~30%) to the AIG1s identified in H. mephisto. Nonetheless, the matches range from 250 to 300 amino acids in length and with e-values from 1e-20 to 1e-30; they come from species as diverse as nematodes, fungi, arthropods, and the parasitic protist Giardia intestinalis. These sequences are generally annotated as uncharacterized or hypothetical proteins, though some are annotated as p-loop containing nucleoside triphosphate hydrolases, suggesting a large GTPase protein family, previously uncharacterized, resides within eukaryotes.

Consistent with the relatively low percent identity of these genes, most did not align well with the H. mephisto sequences, and if they did align, they did not show good bootstrap support in phylogenetic trees. This suggests the eukaryotic superfamily of GTPases is comprised of distinct and divergent subfamilies, only one of which is the AIG1 domain expanded in H. mephisto. We ultimately were able to obtain good alignments and well-supported phylogenetic trees from only 17 nematode species: an AIG1-like cluster including H. mephisto, Diploscapter coronatus, D. pachys, A. suum, and P. redivivus, while a separate MTG1-related cluster includes the Caenorhabditis sequence and other nematodes (Fig. 3a). Notably, the two distinct clusters resolve with 100% bootstrap support, and while the MTG1-related cluster includes members of all sequenced nematode clades (I, III, IV, and V; no Clade II genomes are currently available⁴⁷), the AIG1-like group includes only members of clades III–V, suggesting a potential origin in the Chromadoria⁴⁷ and extensive amplification in H. mephisto (Fig. 3a).

In order to relate these nematode GTPase subfamilies with the previously identified AIG1/IAN/GIMAP sequences from vertebrates and plants^45,46 we built a tree including a broader taxonomic sampling beyond nematodes (Fig. 3b). To simplify the tree, we included only six nematodes in this analysis, five of which we previously characterized as having the AIG1-like genes: H. mephisto, P. redivivus, A. suum, D. coronatus, and D. pachys, and the C. elegans representative of the MTG1-related proteins. In this tree the original IAN/GIMAP human–plant cluster was recovered^45,48 but deeply rooted from the MTG1-related side of the phylogeny with good branch support (Fig. 3b). We found two sequences from Rhizophagus irregularis, a plant–root associated fungus, resolve cleanly into the newly discovered AIG1-like group including H. mephisto (Fig. 3b). The R. irregularis sequences are from two single-nucleus genome sequences, accessions EXX68593.1 and EXX68414⁴⁹. These data suggest that, in contrast to reports of AIG1 being absent in invertebrates^45,50 Clade III–V nematodes do have a previously unknown member of this protein family potentially derived from an ancient horizontal gene transfer from a rhizobial fungus. D. pachys has been reported to inhabit the rhizobial zone of plant roots^37,51, where it is ideally situated to acquire horizontally transferred genes from cohabiting fungus. Given that we found five nematode species host orthologs of the AIG1-like domain in their genomes (Fig. 3a) the HGT event must have been in a Chromadorean ancestor lineage, not in contemporary D. pachys, though H. mephisto has undergone a spectacular expansion of this gene family that is not present in the other lineages, most of which have only a few copies (Fig. 3a). The divergence of suborders Rhabditina (C. elegans, D. pachys, D. coronatus) and Tylenchina (P. redivivus and H. mephisto) has been estimated at 22 million years ago using molecular clock methods⁵². A. suum is a member of Spirurina and separated from the Rhabditina 80 million years ago⁵³, and the putative HGT event must predate these divergences. While inter-domain HGT into eukaryotes has been controversial, transfer from fungus to C. elegans has been documented, setting a precedent for this hypothesized HGT event⁵⁴. Preliminary analysis of codon bias to test for HGT was performed, but caution has been urged given this method’s propensity for misleading results⁵⁵; indeed while we found the codon usage of AIG1 genes differed statistically from the rest of H. mephisto coding sequences, this was also true for our control datasets including collagens, tubulins, and even Hsp70, so we abandoned this approach.

Genes differentially expressed under heat stress

We performed a comprehensive differential expression analysis of H. mephisto transcripts whose expression changes under heat stress, and found that while Hsp70 transcripts are statistically upregulated on exposure to heat, AIG1 transcripts are not (Fig. 4a). Analyzing all transcripts, we identified 285 heat-upregulated and 675 heat-downregulated (Fig. 4b). Because many upregulated transcripts were unknown, and the upregulated set is small, Gene Ontology analysis identified no enriched categories. However, the downregulated set of transcripts was enriched in peptidases, as well as cuticle components and ornithine-oxo-acid transaminase (Table 4 and Fig. 4b). The downregulation of cuticular component is entirely driven by 13 collagens (out of 89 genes in the genome).

Table 4 GO terms enriched in H. mephisto genes downregulated under heat stress

Full size table

As noted, Hsp70 transcripts respond to heat (Fig. 4a) but with two exceptions they do not make the twofold cutoff, and one is actually statistically inhibited on heat stress (Fig. 4b). One AIG1 transcript is included in the statistically downregulated set (Fig. 4b). An intriguing finding is SAX-2 (Sensory AXon guidance 2), a protein involved in neuronal development in C. elegans⁵⁶, and whose inactivation causes a variety of developmental, cellular, and behavioral phenotypes⁵⁷. In H. mephisto heat stress causes an isoform switch, with a longer version, MSTRG.2841.1, expressed exclusively at 25 °C and a shorter version, MSTRG.2841.2, exclusively expressed at 38–40 °C. Thus, sax-2 is identified as a transcript both heat-upregulated (12.9-fold) and heat-downregulated (7.0-fold) on the volcano plot (Fig. 4b). The long isoform is 9,286 bp and encodes a 3,016 amino acid (aa) protein, with the alternative isoform being 132 basepairs shorter owing to an 84 bp change in transcriptional start site and alternate 3′ splice site choice for exon 10 (of 16 exons total). These isoform differences only result in a 16 aa deletion from the predicted protein. Given that the protein is ~3,000 aa, and the 16 aa change does not significantly alter the encoded domains (identified by HMMER as MOR2_PAG1 N, mid, and C-terminal domains), the functional implication of the transcriptional shift and alternative splicing of SAX-2 remains to be investigated in future work.

Among the most strongly heat-induced transcripts was arginine-rich, mutated in early stage tumors (ARMET), also called mesencephalic astrocyte derived neurotrophic factor (MANF), a gene involved in the unfolded protein response (UPR) in the ER⁵⁸. ARMET was upregulated over 44-fold by heat in H. mephisto (Fig. 4b). In mouse and human cells, ARMET interacts directly with the ER Hsp70 protein BIP/GRP78⁵⁹. Therefore, under hot conditions it may be vital for H. mephisto to co-express Hsp70 and ARMET, particularly if they synergize in responding to the damage due to heat in different cellular compartments, Hsp70 in cytosol and ARMET in ER. It appears that H. mephisto has developed different genomic strategies for upregulating these two genes: mild upregulation of many paralogous genes (Hsp70) versus extremely highly induced expression of a single-copy gene (ARMET). Regardless, ARMET is a strong marker of ER stress with cytoprotective roles^58,60,61.

Another upregulated transcript was Bax Inhibitor-1 (BI-1), which was upregulated fourfold (Fig. 4b). BI-1 is a conserved antiapoptotic protein that prevents ER stress-induced apoptosis⁶². BI-1 interacts, binds and suppresses IRE1α activity by canceling its endoribonuclease and kinase activity activity to promote cell survival⁶³. As an ER stress pro-survival factor⁶², BI-1 potentially compensates for the lack of heat induction of AIG1, which plays a similar role inhibiting apoptosis⁴⁶, but may be more specifically tuned to the UPR response.

Discussion

As the founding genome sequence of the Halicephalobus genus, H. mephisto illuminates previously unexplored territory. H. mephisto separated from Caenorhabditis at least 22 million years ago^52,64, and likely over 100 million years ago⁶⁵, though calibrating the nematode molecular clock is difficult because of their poor fossil record⁶⁴. Our data suggest that nematodes access the deep subsurface from surface waters facilitated by seismic activity⁶⁶. This ransition from surface to deep subsurface would be expected to exert strong selective pressures on their genomes, which in nematodes are particularly evolutionarily dynamic^6,7,10,11,12. Therefore, we speculate that the H. mephisto divergence may reflect selection more than neutral genetic drift, making it a particularly informative genome. Consistent with this, the expanded Hsp70 and AIG1 gene families are extremely divergent from earlier exemplars, residing on extended branches in phylogenetic analysis (Figs. 2 and 3), and positive selection was detected along several lineages of the Hsp70 phylogeny (Fig. 2b).

We investigated the novelty of H. mephisto’s 16,186 protein-coding genes by a combination of domain search (HMMER against the Pfam-A database), blastp against the manually curated high-confidence Uniprot-Swisprot database, blastp against the combined proteomes of 28 nematode species (listed in Methods), and Interproscan 5 analysis. At an e-value of 1e-4 we found 10,567 proteins identified by one or more of these methods, leaving 5,619 unknown genes (34.7% of 16,186) lacking domains or any recognizable homology. Nevertheless, 3,599 (64.0%) of the unknown genes are expressed (defined as having at least 5 FPKM across 12 replicates) increasing our confidence they are real genes. Genes whose expression was not detected in our analysis may be expressed at extremely low levels (below 5 FPKM across all replicates) or be expressed under different environmental conditions than the laboratory culture we employed. We conclude that these 5,619 completely unknown genes, and particularly their 3,599 expressed subset, are intriguing candidates for functional adaptation to the deep terrestrial subsurface.

These data are consistent with a previous report that new nematode genomes tend to yield around 33% proteins that are unrecognizable outside their genus⁶, given that H. mephisto is the first of its genus to be sequenced fully. As a control we examined the proteome of P. redivivus, using the identical protein-identification pipeline, finding 33.8% unknown genes, quite similar to the numbers for H. mephisto. We also tested the pine wood nematode, Bursaphelenchus xylophilus, and identified 25.6% unknown genes. Like H. mephisto, P. redivivus, and B. xylophilus are the first genomes of their respective genera to be sequenced. When a within-genus comparison is available, the number of novel genes has been reported to drop to around 10%⁶, which we tested by examining the proteins of Meloidogyne hapla, which has a within-genus match to M. incognita in the 28 nematode blast database. Consistent with predictions, we identified 8.4% unknown proteins in M. hapla. Therefore, we conclude that while the genomic plasticity we observe for H. mephisto is significant, the number of unknown genes is broadly consonant with nematode molecular systematics showing that roundworm genomes are extremely dynamic. The novelty of the H. mephisto genes reflect a combination of evolutionary adaptation and a lack of closely related comparative Halicephalobus species in databases. Supporting this, of the 1,730 genes that match nematode genome(s) only, and were not identified by Interproscan, Uniprot-Swissprot, or Pfam, 1,480 of them match P. redivivius (Supplementary Fig. 2A), the nearest sequenced nematode relative of H. mephisto (Fig. 1a). These 1730 genes are not widely conserved even among roundworms, with only 17 (1%) of them identified in all 28 species and most (511) identified in only one other species (Supplementary Fig. 2B); unsurprisingly the majority of these (399, 78%) were only found in P. redivivus.

The assembled genome of H. mephisto is smaller (61.4 Mb) than most other sequenced nematode genomes with the exception of M. hapla (54 Mb)⁶⁷ and M. incognita (47–51 Mb)⁶⁸, though similar in size to the most closely related species, P. redivivus (64.4 Mb)¹⁸. H. mephisto reproduces via parthenogenesis¹, while both M. hapla and M. incognita are facultatively parthenogenetic^67,68. In Caenorhabditis, loss of males has been linked to genome shrinkage¹¹. In M. incognita⁶⁸ and D. pachys⁶⁹ the loss of sexual reproduction leads to functional haploidization as alleles diverge into paralogs across the genome (leading to most genes being present as duplicate, divergent copies). The conversion of a diploid into a functional haploid genome is associated with three predominant changes: (1) a high degree of heterozygosity as lack of recombination leads to high divergence between alleles, now paralogs, (2) assembly of a haploid genome, and (3) detection of two copies of most genes that are single-copy in other organisms^68,69.

The heterozygosity we observed in H. mephisto is modest: the kmer frequency distribution from Illumina reads shows two prominent peaks consistent with approximately 1% heterozygosity⁷⁰ (Supplementary Fig. 3A) and mapping reads back to the assembly identified 707,190 snps and 55,683 indels (762,873 total variants) confirming an overall snp heterozygosity of 1.15%. In contrast, D. pachys displays 4% heterozygosity⁶⁹.

The H. mephisto genome assembly we obtained is largely haplotype-merged: mapping the reads back to the genome shows that 59.7 Mb of the assembled sequence (97%) is at haplotype-merged coverage (102×) and only 1.7 Mb (3%) exists as potentially diverged haplotypes at lower coverage (Supplementary Fig. 3B). In agreement, CEGMA analysis of assembled reads (Supplementary Fig. 3C) and fragments (Supplementary Fig. 3D) are predominantly single peaks at approximately 100× coverage.

Allele-to-paralog conversion results in two recognizably distinct gene copies, or paralogs, in the genome. In the D. pachys genome CEGMA found an average of 2.12 copies of each core eukaryotic ortholog⁶⁹ yet in H. mephisto CEGMA^13,14 reported an average 1.19 copies of each gene while BUSCO¹⁶ identified 97% single-copy genes. We therefore conclude that H. mephisto exhibits very little functional haploidization into paralogs, and it may represent an early evolutionary stage in this process.

It is important to verify that amplified Hsp70 and AIG1 gene families do not represent assembly errors. One way this could happen is if an assembler produces multiple overlapping contigs encoding the same locus, leading to artifactually high gene copy numbers. In that case multiple near-identical copies of the genes would be present. To test this, we extracted the corresponding nucleotide sequences for each family and performed within-family, all-vs.-all blastn at an evalue of 1e-4 and filtered for nonself matches. These nonself matches were relatively divergent: the best nonself within-family blastn match for 112 Hsp70 loci averaged 86.4% identity at the nucleotide level (Supplementary Fig. 4A) and for 63 AIG1 loci averaged 89.7% identity (Supplementary Fig. 4B). Given the observed genomic heterozygosity is only 1.15%, these data suggest that the Hsp70 and AIG1 genes are diverged paralogs and neither allelic copies nor redundant misassemblies. As a control we extracted 65 collagen genes and performed the same analysis. As might be predicted for an ancient and highly divergent gene family, 74% of collagen genes lacked within-family nonself blast matches entirely, though those that matched averaged 86.6%, similar to Hsp70 (Supplementary Fig. 4C). These sequence divergences are greater than a control analysis performed by blast of the assembly to itself (Supplementary Fig. 4D). We conclude that the expanded Hsp70 and AIG1 families represent parology rather than assembly redundancy artifact.

Often repetitive elements are collapsed by assemblers, so we checked that Hsp70 and AIG1 are not collapsed repeats, which would cause under-representation of their family diversity. Mapping raw reads onto the assembled genome indicates the collapsed repeats as regions of elevated coverage⁷¹. By mapping the raw reads back to the H. mephisto genome we found that the coverage of both Hsp70 and AIG1 families are not elevated relative to the entire genome (Supplementary Fig. 3E). Overall, we conclude that the Hsp70 and AIG1 genes identified in this study are true paralogs, neither over nor underrepresented in the genome.

D. pachys is remarkable for having fused its chromosomes together into a single linkage group and eliminating telomeres. However, over 6,000 telomeric repeat-containing reads (at least four copies of TTAGGC_, the C. elegans telomeric repeat⁷²) were present in the raw Illumina data from H. mephisto. By extracting read pairs with telomeric repeats in at least one read, merging them with PEAR⁷³, and assembling them with MIRA we were able to identify 7 unique subtelomeric regions, suggesting that while the number of H. mephisto chromosomes may be reduced relative to C. elegans, they are not fused as in D. pachys. Consistent with this, we find two homologs of the telomeric gene Protection of Telomeres (POT1) in H. mephisto relative to the three in C. elegans, all of which are lost in D. pachys⁶⁹. Also in contrast to D. pachys, we were able to identify Telomerase Reverse Transcriptase (TERT) in H. mephisto, suggesting that standard telomeres have been retained in the subterrestrial organism and chromosome fusion is not an inevitable consequence of a parthenogenetic lifestyle.

Gene expression provides clues to the adaptation to the warm subterrestrial environment. Among transcripts whose expression changed significantly on exposure to heat, over twice as many are downregulated (675) as upregulated (285) (Fig. 4b). We suggest two potential explanations of this phenomenon: the worms may find lower temperatures more stressful given their native conditions are warmer, and therefore activate more genes at 25 °C relative to 38–40 °C. Or, the downregulation of genes may be itself an adaptation to heat, an idea consistent with the regulated IRE1α dependent decay (RIDD) pathway of the unfolded protein response (UPR)^74,75. When the RIDD pathway is activated, degradation of ribosome-bound transcripts is mediated by the endoribonuclease domain of IRE1α⁷⁴, relieving the immediate protein synthesis demand on the ER, and providing existing proteins time to refold^63,74. Our data cannot definitively distinguish these two theories; however, the elevated expression of protein chaperones like Hsp70 under heat exposure (Fig. 4a) supports the model in which observed changes in transcriptional profile reflect an adaptive response to higher, rather than lower, temperatures. Combined with the observed heat-induced expression of the antiapoptotic factor Bax Inhibitor 1 (BI-1) (Fig. 4b) this response would help H. mephisto survive the abiotic heat stress of the subterrestrial environment.

While we report significantly expanded Hsp70 and AIG1 families, only the Hsp70 genes are upregulated under heat stress in the laboratory (Fig. 4a). Interestingly the worms appear to keep per-gene Hsp70 expression low, even under conditions of heat stress. The per-gene expression of Hsp70 at high temperature is slightly lower than all genes (median Hsp70 FPKM = 2.04, and 2.58 for all genes, Fig. 4a), but the dramatic expansion of Hsp70 paralogs effectively elevates the gene dosage for Hsp70 to 51-fold higher than a single-copy gene would be. This is a similar order of induction as the ARMET protein, an UPR-related single-copy gene induced 44-fold (Fig. 4b), and which may synergize with Hsp70. While many studies have shown induction of Hsp70 genes upon heat-shock, the short-term exposure of non-heat-adapted organism to brief extreme heat^31,32,33,34, organisms under long-term adaptation to heat tend to minimize overexpression of Hsp70 because it has harmful effects on development, fertility, and growth^29,76,77,78. Long-term exposure to heat stress led to downregulated Hsp70 expression in both flies⁷⁹ and fish⁸⁰. The subterrestrial environment of H. mephisto is thermostable over time: four readings during a 5-year period showed the water temperature was an average of 36.8 ± 1.2 °C^2,3,4,81,82. Similarly, we cultured the worms for 2–4 weeks at constant temperatures (25, 38, or 40 °C) in the laboratory for RNA isolation and gene expression analysis. Therefore, H. mephisto’s sustained expression of Hsp70 under conditions of constant stable heat stress implies a functional bypass of the genes’ known detrimental effects on growth and development²⁹. We hypothesize that the divergent Hsp70 genes in H. mephisto that respond most strongly to elevated temperatures may have been functionally modified to ameliorate these deleterious effects, marking them as important candidates for future study.

We were surprised to find that the suite of expanded AIG1 genes in H. mephisto are not activated by heat (Fig. 4a). These genes respond to abiotic stress, including heat, in Arabidopsis⁴⁸ and in mammals the proteins are involved in immune system function, including inhibiting apoptosis during T-cell maturation⁴⁵. Given that H. mephisto has dramatically expanded AIG1 copy numbers, it is tempting to speculate that these genes may be involved in responding to hypoxia or other abiotic nonthermal stresses present in the deep terrestrial subsurface, where their pro-survival functions should be adaptive. However, this hypothesis remains to be tested in future experiments. Remarkably, however, heat induced another anti-apoptotic factor in H. mephisto--the Bax Inhibitor 1, BI-1⁶², suggesting a different method for blocking apoptotic response under subterrestrial heat stress. These data suggest H. mephisto has adapted to the subterrestrial environment by managing unfolded protein stresses while upregulating Hsp70 and inhibiting apoptosis.

We note that the pacific oyster Crassostrea gigas has convergently expanded Hsp70 and AIG1 gene families⁸³ and activates the UPR in response to abiotic stress including heat⁸⁴, so H. mephisto helps define a general evolutionary adaptive response to heat stress⁸⁵. While the oyster experiences considerable thermal fluctuation, H. mephisto does not, as described above. Therefore, the signature of adaptation we report is not limited to cyclical or temporary temperature fluctuations but extends to adaptation to constant warm environments.

The expansion of Hsp70 is shared, also convergently, by distantly related Diploscapter species, soil nematodes which display pronounced thermotolerance^37,39, and we show here that positive selection is detectable in Hsp70 lineages in both Diploscapter and H. mephisto.These findings may not only relate to environmental heat: D. coronatus has been reported as a facultative parasite of humans, so it must survive to 37 °C, human body temperature³⁸. Consistent with this, the closest relative of H. mephisto is a deadly horse parasite, H. gingivalis, which has not been fully sequenced, and has also been reported as a facultative and fatal parasite of humans⁸⁶. Therefore, the genome signature of adapation to heat in H. mephisto, D. coronatus, and D. pachys may serve as a preadaptational bridge to parasitic lifestyles at least in some lineages. Therefore, these genomic adaptive strategies are of significant concern to human and animal health, and as our climate warms, it will be increasingly important to understand their evolutionary dynamics.

Methods

H. mephisto culture and isolation of DNA and RNA

The methods in this work comply with all relevant ethical regulations for research. H. mephisto were cultured by standard C. elegans methods⁸⁷, on agar plates seeded with Escherichia coli OP50. For DNA extraction the NucleoSpin kit (Cat #740952.250, Macherey-Nagel, Bethlehem, PA, USA) was used. The pellet was resuspended in 540 μl T1 buffer supplemented with 10 μl Proteinase K. Lysis was accomplished by 4 cycles of rapid freeze-thaw using a dry ice-ethanol bath, with thawing on a 56 °C heatblock; cycles were approximately 1 min per freeze or thaw step. After this the sample was treated with an additional 25 μl proteinase K overnight at 56 °C. After this the manufacturer’s protocol was followed for column purification of high-quality DNA, which was verified by gel electrophoresis prior to library construction.

For RNA, the worms were cultured on 5% agar plates at 25, 38, or 40 °C for 2–4 weeks prior to harvest, then pelleted in PBS pH 7.7, and flash-frozen or immersed in DNA/RNA Shield Buffer for storage and extraction of nucleic acids. For RNA extraction Zymo’s Duet DNA/RNA MiniPrep Plus kit (Cat # D7003) was used. The worm pellet was resuspended in 300 μl of DNA/RNA Shield buffer, transferred to a tube of 0.5 mm BashingBeads (Zymo Cat # S6002), and homogenized on a vortexer at maximum speed twice for 5 min with a 1–2 min rest period between (for cooling). After this 30 μl of PK buffer and 15 μl Proteinase K were added, and the solution incubated for 30 min at 55 °C, then supplemented with 345 μl lysis buffer. After pelleting the insoluble material at 16,000g for 1 min, the supernatant was transferred to yellow (for DNA) and green (for RNA) columns as described in manufacturer’s protocol for the Duet DNA/RNA MiniPrep Plus Kit, performing in-column DNAse treatment of RNA as recommended.

Based on Agilent BioAnalyzer 2100 output, a total of 12 samples yielded RNA of sufficient quality for sequencing: 3 from 25 °C, 3 from 38 °C, and 6 from 40 °C. For analysis the 38 °C and 40 °C samples were considered together as “high” temperature and the 25 °C replicates as “normal” temperature.

Genomic DNA sequencing

For Illumina, a TruSeq library with insert size of 387 bp was generated with 9 cycles of PCR after gel purification. This library was paired-end sequenced on a HiSeq2500 with 215 bp reads, yielding 58.7 million pairs. This data was assembled with Platanus⁷⁰ as described in the next section. For PacBio, 30 lanes were run on the RS II system using libraries generated off the same DNA sample used in Illumina.

RNA sequencing

Stranded RNA-seq libraries were generated using the KAPA Total Stranded preparation kit (KAPA Biosystems catalog # KK8484) with an average insert size of 175 bp. Samples were sequenced on a HiSeq 2500 in High Output mode using 2 × 50 bp paired-end reads, generating at least 60 M total reads per sample. Preliminary quality control for read mapping was performed using taxMaps version 0.2.1⁸⁸, on a downsampled subset (1%) with the NCBI BLAST nt and a kmer size of 75 to confirm species etiology for the reads generated.

Genome assembly

Raw read kmer analysis was performed with the SoapEC v. 2.01 (from the SOAPdenovo2 package⁸⁹), KmerFreq_HA set to kmer 23 and error corrected with Corrector_HA. We found that platanus assembly was optimial at 100× average coverage, so a total of 15,582,039 random paired, error-corrected reads were used in the final assembly. Platanus version 1.2.4⁷⁰ was used to assemble with a stepsize 2, kmer of 21, and −u 1. Platanus scaffolding was performed with settings −n 345, −a 386, and −d 62. Subsequently gaps were closed with Platanus gapclose. Scaffolds less than 500 bp were discarded and this assembly was further scaffolded with 30 lanes of PacBio data using the PBJelly component of PBSuite v. 15.8.24. Reapr v. 1.0.18 (perfectmap, −b) was used to break chimeras and erroneous gaps. We removed sequences under 1000 bp and identified 40 bacterial scaffolds by a combination of coverage (less than 26×) and GC content (over 55%), which blast confirmed as prokaryotic. These sequences were found to encode a complete sphingomonas genome to be described elsewhere, but does not appear to be a deep subterrestrial inhabitant based on metagenomic borehole read mapping. After additional removal of the mitochondrial scaffold, this final H. mephisto nuclear genome of 880 scaffolds was used in all further analysis. This assembly has an N50 of 313 kb, with the longest 2.55 Mb and is highly contiguous: the final assembly encodes only 10 gaps encoding 476 bp (0.0008% of sequence).

Heterozygosity

The error corrected reads were mapped back to the final H. mephisto assembly with bwa-mem v.0.7.12 and mis-mapped reads and PCR duplicates were removed with samtools v. 1.9. Using the remaining 31,867,988 reads, snp variants were called with bcftools v. 1.9 mpileup and then call command with the −mv flag.

Analysis of within-family nonself blastn matches

Gene family coding sequences were extracted into a fasta file based on the transcript coordinates from the GFF file produced by gene annotation with Maker2 and Stringtie. These fasta files were used to build blast databases. These databases were each queried using blastn (at 1e−4) using the same fasta file as query that was used to build the database (thus, performing all-vs.-all blastn). The resultant blast output in tabular format (−outfmt 6) was parsed using a custom python script to isolate only the first (best) non-self match from the blast output, and a histogram was generated of the percent identies of these nonself matches. For the genomic assembly all-vs.-all comparison, the same process was carried out but using the 880-contig full genome assembly.

Analysis of coverage

Genome-wide coverage was calculated using the samtools v. 1.9 depth command on the bamfile generated for heterozygosity analysis, followed by custom parsing of the coverage file with a python script. The same coverage file was parsed using the unique transcript coordinates as described in the previous section. The per-basepair coverage values for Hsp70, AIG1, and the entire genome were evaluated with custom python script producing a boxplot shown in the figure.

Analysis of repetitive sequence

RepeatModeler⁹⁰ v.1.0.11 was used to create a custom repeat library. This library was screened for accuracy with HMMER⁹¹ v 3.1b2 to identify mis-classifed protein-coding genes, which were removed. This library was used in a RepeatMasker⁹⁰ v 4.0.6 run using the default parameters. The initial RepeatMasker run designated 21.07% of the genome as consisting of transposable elements, of which 18.48% of the total genome, or 87.7% of the repeat segments, as unclassified repeats. Subsequently, nhmmer¹⁹ analysis was run on the identified repeats using the DFAM database⁹² with e-value set to 1e⁻² to accommodate the highly divergent genome.

Gene discovery

Maker2²⁰ version 2.38.1 was utilized to run Augustus and SNAP as ab initio predictors to make comprehensive gene predictions for H. mephisto, and incorporating 28 nematode proteomes as hints along with the RNA-seq data. These gene predictions were refined utilizing Tophat2- StringTie-Ballgown suites of programs^21,22, which also estimate expression levels. Tophat2 v.2.1.1 was used to align the RNA-seq data against the H. mephisto genome with Maker2 predicted genes as a input.gff3 file. The resulting.bam files were fed into StringTie v. 1.3.4, to generate a transcriptome annotation of each, as well as quantify the expression levels and estimate the abundance of each transcript, which were subsequently unified using Stringtie’s merge function²². Ballgown v. 2.12.0 plots the gene abundance and expression data for visualization, from the StringTie output data²².

Together StringTie and Maker2 predicted 34,605 transcripts across 12 different RNAseq datasets, which map to a distinct set of 17,209 unique loci as defined by gffcompare. From these loci the longest protein sequence predicted by TransDecoder²³ v. 5.3.0, was used in domain comparisons with C. elegans, the reference proteome UP000001940_6239 (19,922 nonredundant proteins) from Ensembl RELEASE 2018_04. TransDecoder was run in strict mode, requiring at least 50 amino acids, and only the single best cds prediction per transcript retained, to obtain the nonredundant set of 16,186 protein-coding genes.

The 28 nematode proteomes used were obtained from WormBase Parasite, https://parasite.wormbase.org and their accessions are listed in Supplementary Table 1.

Gene expression

After gene discovery, expression analysis was carried out using Ballgown⁹³ v. 2.12.0 following the protocol as described²². Genes with less than 5 total reads across all 12 replicates were filtered from the analysis. Replicates were grouped into “high” (38–40 °C) or “low” (25 °C) and the Ballgown stattest() function used to identify those genes statistically different between high and low temperatures. The output of stattest() include q value and fold-change and these were used to generate the volcano plot as a scatterplot with ggplot2 in R. Significantly heat-regulated genes were defined as exhibiting a q value less than 0.05 and upregulated or downregulated at least twofold under heat-stress conditions (38–40 °C) relative to 25 °C controls. For the boxplots of gene expression, the FPKM values at high or low temperature were exported as a text file and imported into Python 2.7.14 where a custom script was used to construct the boxplots with matplotlib 2.2.2.

Analysis of unknown genes

H. mephisto genes were analyzed by blastp (evalue 1e-4) against a collection of 28 nematodes (Supplementary Table 1). When blasting with controls P. redivivus, M. hapla, or B. xylophilus we created a separate database removing itself (to avoid the trivial self-matching) and replaced it with the H. mephisto proteome, keeping a 28 nematode species comparison set. For all analyses we also performed blastp against the uniprot-swissprot manually curated database (1e-4), Hmmer domain search against the PfamA database (1e-4), and Interproscan 5.30–69.0 running TIGRFAM 15.0, Hamap 2018_03, SMART 7.1, PRINTS 42.0, and Pfam 31.0. Custom python scripts were used to combine output of all analyses and identify true unknown genes.

Venn diagram

The genome of each species was uploaded onto the selected template on the website OrthoVenn (http://www.bioinfogenome.net/OrthoVenn/).

Domain comparisons

HMMER⁹⁴ was used to identify protein domains from both H. mephisto and C. elegans nonredundant protein predictions with evalue 1e-10. Domain counts of H. mephisto were compared to C. elegans using a custom python script.

Multilocus phylogenetic tree

Phylogenetic relatedness of Halicephalobus mephisto relative to other nematode species in Clade IV was determined by constructing a maximum likelihood tree of single-copy orthologous genes. Protein-coding sequences of 21 nematodes were downloaded from WormBase (release WBPS9) on March 26, 2018. They are B. xylophilus, Ditylenchus destructor, Globodera pallida, Globodera rostochiensis, Meloidogyne floridensis, M. hapla, P. redivivus, Parastrongyloides trichosuri, Rhabditophanes sp. KR3021, Steinernema carpocapsae, S. feltiae, S. glaseri, S. monticolum, S. scapterisci, Strongyloides papillosus, S. ratti, S. stercoralis, and S. venezuelensis. The three outgroup proteomes were Clade V nematodes, namely Diploscapter pachys PF1309 (NCBI project number PRJNA280107), Heterorhabditis bacteriophora (NCBI project number PRJNA438576), and C. elegans (WormBase release WS264). Following the procedures of OrthoMCL v2.0.9^95,96, the proteome files were modified to the required format by OrthoMCL (step5), and then filtered to remove sequences that are shorter than 10 amino acid residues and have less than 20% of stop codons (step6). An All-vs.-All BLAST⁹⁷ search among all proteomes was performed as suggested by OrthoMCL step7, which involved creating a BLAST-searchable protein database with masking information, followed by a BLASTp search with an e-value threshold of 1e-5 (−evalue 1e-5), and results were stored in a tab-delimited file (−outfmt 6). The BLASTp results were parsed using orthAgogue v1.0.3⁹⁸, which filtered out protein pairs with overlap less than 50% (−o 50) and BLAST (or bit) scores below 50 (−u 50) before identifying valid protein pairs. This step substituted OrthoMCL step8 to step11. The resultant orthologs.abc file was then used for clustering (inflation index, −I = 2.0) and creating orthologous groups (OrthoMCL step 12–13). Orthologous groups that contained a single protein sequence from each of the 22 genomes were considered as SCOGs. Sequences of 99 identified SCOGs were aligned individually using MUSCLE v3.8.31⁹⁹ and the default parameters, and trimmed using trimAl¹⁰⁰ to remove residual positions that were shared by less than 50% of the sequences in the multiple sequence alignments (−gt 0.5). Trimmed alignments were manually examined to make sure there was no spurious sequences or poorly aligned regions, and each was evaluated by ProtTest v3.4.2¹⁰¹ to identify for the best substitution model. Sequences of these 99 SCOGs were concatenated by taxa. Using this final multiple sequence alignment of 43,188 amino acid positions (including 34,549 distinct patterns) and the specific best substitution model identified for individual SCOGs, partitioned phylogenetic analysis was performed using raxml-ng v0.4.1b to find the best maximum-likelihood tree. The substitution models used were JTT + G, JTT + G + F, JTT + I + G, LG + G, LG + G + F, LG + I + G, LG + I + G + F, RtREV + I + G + F, VT + I + G, WAG + I + G, and WAG + I + G + F. Robustness of tree topology was evaluated by 100 iterations of bootstrap analysis.

GO analysis

The find_enrichment.py script from the GOATOOLS v0.6.10 package¹⁰² was used under default settings to examine the 285 upregulated and 675 downregulated genes relative to the entire set of proteins. GO terms were assigned using Interproscan 5.30.69¹⁰³.

Hsp70 and AIG1 tree building

For Hsp70 full-length proteins were aligned, and for AIG1 the proteins were broken into domains using the envelope coordinates provided by HMMER and a custom Python script. We labeled the domains by order within the original protein. Alignments performed with MAFFT¹⁰⁴ v.7.017 and refined manually to minimize indels. Trees were generated with MrBayes¹⁰⁵ v.3.2.6, using the blosum rate matrix and invgamma rate variation; and RAxML 8.2.12¹⁰⁶ with the PROTCATBLOSUM62 rate matrix and 200 bootstrap replicates. All non-mephisto sequences in Fig. 2 are identified using their Wormbase ParaSite (https://parasite.wormbase.org/index.html) gene identifier. Accessions are given in Supplementary Table 1. For Fig. 3 we prepended non-mephisto sequences with the shortest possible (normally two letter) genus and species abbreviation (e.g., “Dp” for Diploscapter pachys) followed by “^” prior to the Wormbase ParaSite gene identitifer. Non-nematode gene sequences are indicated by their NCBI accession numbers and UniProtKB identifier for R. irregularis.

PAML with branch-sites model

Using Codeml (PAML v.4.9)^41,107 branch site ω was estimated using runmode = 0, seqtype = 1, model = 2, NSsites = 2, in the Codeml control file. All other parameters were used on the default setting. Each branch was estimated twice: once with a neutral model (fix_omega = 1 and omega = 1) and once using a purifying selection model (fix_omega = 0, omega = 1). The p values were determined using the LRT statistic 2Δl compared against χ² with critical values of 3.84, 5% significance level, and 6.63, for 1% significance¹⁰⁸. In our analysis there were (after gap removal) 891 sites across 55 different aligned nucleotide (coding) sequences.

Analysis of subtelomeric sequences

Extraction of telomere repeat-containing read pairs (at least one of which contains at least four copies of TTAGGC) from raw Illumina data was performed using a custom Python script. These read pairs were merged at their overlaps using PEAR⁷³ v0.9.8, and the resultant fused reads were assembled with MIRA¹⁰⁹ v4.0. The resulting contigs were manually inspected and redundant sequences were collapsed to obtain an estimate of subtelomeric region number.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession SWDT00000000. The version described in this paper is version SWDT01000000. The raw Illumina DNA and RNA data, and PacBio DNA data are available on the Sequence Reads Archive (SRA) at accession PRJNA528747. Transcriptome expression data as FPKM values genome-wide is available in the Gene Expression Omnibus (GEO) at accession GSE133178. The genome annotation file and transcript and protein predictions are available from WormBase Parasite at accession Halicephalobus_mephisto_prjna528747. The source data underlying Fig. 4A, B are provided as a Source Data File.

References

Borgonie, G. et al. Nematoda from the terrestrial deep subsurface of South Africa. Nature 474, 79–82 (2011).
Article ADS CAS PubMed Google Scholar
Lau, M. C. Y. et al. An oligotrophic deep-subsurface community dependent on syntrophy is dominated by sulfur-driven autotrophic denitrifiers. Proc Natl Acad. Sci. USA 113, E7927–E7936 (2016).
Article CAS PubMed PubMed Central Google Scholar
Magnabosco, C. et al. Fluctuations in populations of subsurface methane oxidizers in coordination with changes in electron acceptor availability. FEMS Microbiol. Ecol. 94, https://doi.org/10.1093/femsec/fiy089 (2018).
Lau, M. C. Y. et al. Phylogeny and phylogeography of functional genes shared among seven terrestrial subsurface metagenomes reveal N-cycling and microbial evolutionary relationships. Front. Microbiol. 5, ARTN 53110.3389/fmicb.2014.00531 (2014).
Borgonie, G. et al. Eukaryotic opportunists dominate the deep-subsurface biosphere in South Africa. Nat. Commun. 6, 8952 (2015).
Article ADS CAS PubMed Google Scholar
Rödelsperger, C., Streit, A. & Sommer, R. J. In eLS (John Wiley & Sons, Ltd., 2013).
Wasmuth, J., Schmid, R., Hedley, A. & Blaxter, M. On the extent and origins of genic novelty in the phylum Nematoda. PLoS Negl. Trop. Dis. 2, e258 (2008).
Article PubMed PubMed Central CAS Google Scholar
Coghlan, A. Nematode genome evolution. WormBook, 1–15, https://doi.org/10.1895/wormbook.1.15.1 (2005).
Borgonie, G. et al. Refuge from predation, the benefit of living in an extreme acidic environment? Biol. Bull. 219, 268–276 (2010).
Article CAS PubMed Google Scholar
Hunt, V. L. et al. The genomic basis of parasitism in the Strongyloides clade of nematodes. Nat. Genet. 48, 299–307 (2016).
Article CAS PubMed PubMed Central Google Scholar
Yin, D. et al. Rapid genome shrinkage in a self-fertile nematode reveals sperm competition proteins. Science 359, 55–+, (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Lespinet, O., Wolf, Y. I., Koonin, E. V. & Aravind, L. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 12, 1048–1059 (2002).
Article CAS PubMed PubMed Central Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Article CAS PubMed Google Scholar
Parra, G., Bradnam, K., Ning, Z., Keane, T. & Korf, I. Assessing the gene space in draft genomes. Nucleic Acids Res. 37, 289–297 (2009).
Article CAS PubMed Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article CAS PubMed Google Scholar
Blaxter, M. Nematodes: the worm and its relatives. PLoS Biol. 9, e1001050 (2011).
Article CAS PubMed PubMed Central Google Scholar
Srinivasan, J. et al. The draft genome and transcriptome of Panagrellus redivivus are shaped by the harsh demands of a free-living lifestyle. Genetics 193, 1279–+ (2013).
Article CAS PubMed PubMed Central Google Scholar
Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487–2489 (2013).
Article CAS PubMed PubMed Central Google Scholar
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinforma. 12, 491 (2011).
Article Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Article CAS PubMed Google Scholar
Chervitz, S. A. et al. Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science 282, 2022–2028 (1998).
Article ADS CAS PubMed PubMed Central Google Scholar
Thomas, J. H. Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains. Genetics 172, 127–143 (2006).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y., Coleman-Derr, D., Chen, G. & Gu, Y. Q. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res. 43, W78–W84 (2015).
Article CAS PubMed PubMed Central Google Scholar
Assimon, V. A., Gillies, A. T., Rauch, J. N. & Gestwicki, J. E. Hsp70 protein complexes as drug targets. Curr. Pharm. Des. 19, 404–417 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sherman, M. Y. & Gabai, V. L. Hsp70 in cancer: back to the future. Oncogene 34, 4153–4161 (2015).
Article CAS PubMed Google Scholar
Sorensen, J. G., Kristensen, T. N. & Loeschcke, V. The evolutionary and ecological role of heat shock proteins. Ecol. Lett. 6, 1025–1037 (2003).
Article Google Scholar
Usman, M. G. et al. Molecular analysis of Hsp70 mechanisms in plants and their function in response to stress. Biotechnol. Genet Eng. 33, 26–39 (2017).
Article CAS Google Scholar
Brunquell, J., Morris, S., Lu, Y., Cheng, F. & Westerheide, S. D. The genome-wide role of HSF-1 in the regulation of gene expression in Caenorhabditis elegans. BMC Genomics 17, ARTN 55910.1186/s12864-016-2837-5 (2016).
Lakhotia, S. C., Srivastava, P. & Prasanth, K. V. Regulation of heat shock proteins, Hsp70 and Hsp64, in heat-shocked Malpighian tubules of Drosophila melanogaster larvae. Cell Stress Chaperones 7, 347–356 (2002).
Article CAS PubMed PubMed Central Google Scholar
Prahlad, V., Cornelius, T. & Morimoto, R. I. Regulation of the cellular heat shock response in Caenorhabditis elegans by thermosensory neurons. Science 320, 811–814 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Rodriguez, M., Snoek, L. B., De Bono, M. & Kammenga, J. E. Worms under stress: C. elegans stress response and its relevance to complex human disease and aging. Trends Genet. 29, 367–374 (2013).
Article CAS PubMed Google Scholar
Yu, A. et al. Roles of Hsp70s in stress responses of microorganisms, plants, and animals. Biomed. Res. Int. 2015, 510319 (2015).
PubMed PubMed Central Google Scholar
Brocchieri, L., de Macario, E. C. & Macario, A. J. L. hsp70 genes in the human genome: conservation and differentiation patterns predict a wide array of overlapping and specialized functions. BMC Evol. Biol. 8, Artn 1910.1186/1471-2148-8-19 (2008).
Gibbs, D. S., Anderson, G. L., Beuchat, L. R., Carta, L. K. & Williams, P. L. Potential role of Diploscapter sp strain LKC25, a bacterivorous nematode from soil, as a vector of food-borne pathogenic bacteria to preharvest fruits and vegetables. Appl. Environ. Micro. 71, 2433–2437 (2005).
Article CAS Google Scholar
Chandler, A. C. Diploscapter coronata as a facultative parasite of man, with a general review of vertebrate parasitism by rhabditoid worms. Parasitology 30, 44–55 (2009).
Article Google Scholar
Lemzina, L. V. & Gagarin, V. G. I. o. B., Kyrghyzian Academy of Sciences, Chou pr. 265, Bishkek, Kyrghyzstan). New species of free-living nematodes from thermal waters in Kyrghyzstan. v. 3 (1994).
Yang, Z. Inference of selection from multiple species alignments. Curr. Opin. Genet. Dev. 12, 688–694 (2002).
Article CAS PubMed Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Yang, Z., Wong, W. S. & Nielsen, R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol. Biol. Evol. 22, 1107–1118 (2005).
Article CAS PubMed Google Scholar
Perneger, T. V. What’s wrong with Bonferroni adjustments. Br. Med. J. 316, 1236–1238 (1998).
Article CAS Google Scholar
Reuber, T. L. & Ausubel, F. M. Isolation of arabidopsis genes that differentiate between resistance responses mediated by the RPS2 and RPM1 disease resistance genes. Plant Cell 8, 241–249 (1996).
CAS PubMed PubMed Central Google Scholar
Nitta, T. et al. IAN family critically regulates survival and development of T lymphocytes. PLoS Biol. 4, e103 (2006).
Article PubMed PubMed Central CAS Google Scholar
Wang, Z. & Li, X. IAN/GIMAPs are conserved and novel regulators in vertebrates and angiosperm plants. Plant Signal. Behav. 4, 165–167 (2009).
Article CAS PubMed PubMed Central Google Scholar
Blaxter, M. & Koutsovoulos, G. The evolution of parasitism in Nematoda. Parasitology 142(Suppl 1), S26–S39 (2015).
Article PubMed Google Scholar
Liu, C., Wang, T., Zhang, W. & Li, X. Computational identification and analysis of immune-associated nucleotide gene family in Arabidopsis thaliana. J. Plant Physiol. 165, 777–787 (2008).
Article CAS PubMed Google Scholar
Lin, K. et al. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus. Plos Genet. 10, ARTN e100407810.1371/journal.pgen.1004078 (2014).
Nitta, T. & Takahama, Y. The lymphocyte guard-IANs: regulation of lymphocyte survival by IAN/GIMAP family proteins. Trends Immunol. 28, 58–65 (2007).
Article CAS PubMed Google Scholar
Steiner, G. Opuscula miscellanea nematologica. IX. Proc. Helminthol. Soc. Wash. 9, 32–34 (1942).
Google Scholar
McGill, L. M., Fitzpatrick, D. A., Pisani, D. & Burnell, A. M. Estimation of phylogenetic divergence times in Panagrolaimidae and other nematodes using relaxed molecular clocks calibrated with insect and crustacean fossils. Nematology 19, 899–+ (2017).
Article CAS Google Scholar
Okimoto, R., Macfarlane, J. L., Clary, D. O. & Wolstenholme, D. R. The mitochondrial genomes of two nematodes, Caenorhabditis elegans and Ascaris suum. Genetics 130, 471–498 (1992).
CAS PubMed PubMed Central Google Scholar
Parkinson, J. & Blaxter, M. SimiTri–visualizing similarity relationships for groups of sequences. Bioinformatics 19, 390–395 (2003).
Article CAS PubMed Google Scholar
Friedman, R. & Ely, B. Codon usage methods for horizontal gene transfer detection generate an abundance of false positive and false negative results. Curr. Microbiol. 65, 639–642 (2012).
Article CAS PubMed Google Scholar
Gallegos, M. E. & Bargmann, C. I. Mechanosensory neurite termination and tiling depend on SAX-2 and the SAX-1 kinase. Neuron 44, 239–249 (2004).
Article CAS PubMed Google Scholar
Wormbase. (http://www.wormbase.org).
Mizobuchi, N. et al. ARMET is a soluble ER protein induced by the unfolded protein response via ERSE-II element. Cell Struct. Funct. 32, 41–50 (2007).
Article CAS PubMed Google Scholar
Glembotski, C. C. et al. Mesencephalic astrocyte-derived neurotrophic factor protects the heart from ischemic damage and is selectively secreted upon sarco/endoplasmic reticulum calcium depletion. J. Biol. Chem. 287, 25893–25904 (2012).
Article CAS PubMed PubMed Central Google Scholar
Apostolou, A., Shen, Y., Liang, Y., Luo, J. & Fang, S. Armet, a UPR-upregulated protein, inhibits cell proliferation and ER stress-induced cell death. Exp. Cell Res. 314, 2454–2467 (2008).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. MANF protects dopamine neurons and locomotion defects from a human alpha-synuclein induced Parkinson’s disease model in C. elegans by regulating ER stress and autophagy pathways. Exp. Neurol. 308, 59–71 (2018).
Article CAS PubMed Google Scholar
Cai, J. et al. Bax inhibitor-1 from orange spotted grouper, Epinephelus coioides involved in viral infection. Fish. Shellfish Immunol. 78, 91–99 (2018).
Article CAS PubMed Google Scholar
Sano, R. & Reed, J. C. ER stress-induced cell death mechanisms. Biochim. Biophys. Acta 1833, 3460–3470 (2013).
Article CAS PubMed Google Scholar
Cutter, A. D. Divergence times in Caenorhabditis and Drosophila inferred from direct estimates of the neutral mutation rate. Mol. Biol. Evol. 25, 778–786 (2008).
Article CAS PubMed Google Scholar
Stein, L. D. et al. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol. 1, E45 (2003).
Article PubMed PubMed Central CAS Google Scholar
Borgonie, G. et al. New ecosystems in the deep subsurface follow the flow of water driven by geological activity. Sci. Rep. 9, 3310 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Opperman, C. H. et al. Sequence and genetic map of Meloidogyne hapla: a compact nematode genome for plant parasitism. Proc. Natl Acad. Sci. USA 105, 14802–14807 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Abad, P. et al. Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita. Nat. Biotechnol. 26, 909–915 (2008).
Article CAS PubMed Google Scholar
Fradin, H. et al. Genome architecture and evolution of a unichromosomal asexual nematode. Curr. Biol. 27, 2928–2939 e2926 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kajitani, R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24, 1384–1395 (2014).
Article CAS PubMed PubMed Central Google Scholar
Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2011).
Article PubMed PubMed Central CAS Google Scholar
Wicky, C. et al. Telomeric repeats (TTAGGC)n are sufficient for chromosome capping function in Caenorhabditis elegans. Proc. Natl Acad. Sci. USA 93, 8983–8988 (1996).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014).
Article CAS PubMed Google Scholar
Hollien, J. & Weissman, J. S. Decay of endoplasmic reticulum-localized mRNAs during the unfolded protein response. Science 313, 104–107 (2006).
Article ADS CAS PubMed Google Scholar
Maurel, M., Chevet, E., Tavernier, J. & Gerlo, S. Getting RIDD of RNA: IRE1 in cell fate regulation. Trends Biochem. Sci. 39, 245–254 (2014).
Article CAS PubMed Google Scholar
Krebs, R. A. & Feder, M. E. Deleterious consequences of Hsp70 overexpression in Drosophila melanogaster larvae. Cell Stress Chaperones 2, 60–71 (1997).
Article CAS PubMed PubMed Central Google Scholar
Krebs, R. A. & Loeschcke, V. Effects of exposure to short-term heat-stress on fitness components in Drosophila melanogaster. J. Evolut. Biol. 7, 39–49 (1994).
Article Google Scholar
Williams, K. D., Helin, A. B., Posluszny, J., Roberts, S. P. & Feder, M. E. Effect of heat shock, pretreatment and hsp70 copy number on wing development in Drosophila melanogaster. Mol. Ecol. 12, 1165–1177 (2003).
Article CAS PubMed Google Scholar
Sorensen, J. G., Dahlgaard, J. & Loeschcke, V. Genetic variation in thermal tolerance among natural populations of Drosophila buzzatii: down regulation of Hsp70 expression and variation in heat stress resistance traits. Funct. Ecol. 15, 289–296 (2001).
Article Google Scholar
Narum, S. R., Campbell, N. R., Meyer, K. A., Miller, M. R. & Hardy, R. W. Thermal adaptation and acclimation of ectotherms from differing aquatic climates. Mol. Ecol. 22, 3090–3097 (2013).
Article PubMed Google Scholar
Magnabosco, C. et al. Comparisons of the composition and biogeographic distribution of the bacterial communities occupying South African thermal springs with those inhabiting deep subsurface fracture water. Front. Microbiol. 5, ARTN 67910.3389/fmicb.2014.00679 (2014).
Simkus, D. N. et al. Variations in microbial carbon sources and cycling in the deep continental subsurface. Geochim. Cosmochim Acta 173, 264–283 (2016).
Article ADS CAS Google Scholar
Powell, D. et al. The genome of the oyster Saccostrea offers insight into the environmental resilience of bivalves. DNA Res. 25, 655–665 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, G. et al. The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490, 49–54 (2012).
Article ADS CAS PubMed Google Scholar
Guerin, M. N., Weinstein, D. J. & Bracht, J. R. Stress adapted mollusca and nematoda exhibit convergently expanded Hsp70 and AIG1 gene families. J. Mol. Evol. https://doi.org/10.1007/s00239-019-09900-9 (2019).
Article PubMed Google Scholar
Lim, C. K. et al. First human case of fatal Halicephalobus gingivalis meningoencephalitis in Australia. J. Clin. Microbiol. 53, 1768–1774 (2015).
Article PubMed PubMed Central Google Scholar
Brenner, S. The genetics of Caenorhabditis elegans. Genetics 77, 71–94 (1974).
CAS PubMed PubMed Central Google Scholar
Corvelo, A., Clarke, W. E., Robine, N. & Zody, M. C. taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time. Genome Res. 28, 751–758 (2018).
Article CAS PubMed PubMed Central Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS PubMed PubMed Central Google Scholar
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. http://www.repeatmasker.org.
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–D82 (2013).
Article CAS PubMed Google Scholar
Frazee, A. C. et al. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat. Biotechnol. 33, 243–246 (2015).
Article CAS PubMed PubMed Central Google Scholar
Eddy, S. R. Profile hidden Markov models. Bioinformatics 14, 755–763 (1998).
Article CAS PubMed Google Scholar
Fischer, S. et al. Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr. Protoc. Bioinforma. 12(Chapter 6), 11–19 (2011). Unit 6.
Google Scholar
Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Article CAS PubMed PubMed Central Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinforma. 10, 421 (2009).
Article CAS Google Scholar
Ekseth, O. K., Kuiper, M. & Mironov, V. orthAgogue: an agile tool for the rapid prediction of orthology relations. Bioinformatics 30, 734–736 (2014).
Article CAS PubMed Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar
Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Article CAS PubMed PubMed Central Google Scholar
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165 (2011).
Article CAS PubMed Google Scholar
Klopfenstein, D. V. et al. GOATOOLS: a Python library for Gene Ontology analyses. Sci. Rep. 8, 10872 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Article CAS PubMed PubMed Central Google Scholar
Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).
Article CAS PubMed Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556 (1997).
CAS PubMed Google Scholar
Yang, Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573 (1998).
Article CAS PubMed Google Scholar
Chevreux, B. et al. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 14, 1147–1159 (2004).
Article CAS PubMed PubMed Central Google Scholar
Foth, B. J. et al. Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction. Nat. Genet. 46, 693–700 (2014).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors acknowledge Laura Landweber, in whose laboratory J.R.B. prepared the H. mephisto Illumina DNA library. We thank Wei Wang, Jessica Wiggins, and Donna Storton of the Princeton Sequencing Core Facility for assistance with Illumina library preparation and sequencing. We also acknowledge Aaron Goldman for early work done on the H. mephisto project, and Elizabeth Ginsburg for comments on the paper. We thank Evgeny Bisk for assistance with the installation and troubleshooting of software on the Zorro Computer Cluster at American University. Mariana Erasmus acknowledges support from the Technology Innovation Agency (TIA), an agency of the Department of Science and Technology (DST) and is grateful to Izelle de Beer and Bernice Jordaan for their assistance in culturing of the nematodes. We are indebted to the logistical support of Sibanye Gold Limited, the management and staff of Beatrix Gold Mine that enabled the capture of H. mephisto and follow-up samples for metagenomic studies. The work of M.C.Y.L. was supported by NSF Grant #DEB-1441646 to T.C.O., and J.R.B. was supported by NIH 1K22CA184297 and a NASA DC Space Grant Consortium grant which funded the RNA and PacBio sequencing.

Author information

Sarah E. Allen
Present address: Extreme Life Isyensya, Gentbrugge, 9050, Belgium
Esta van Heerden
Present address: Biology Department, Cornell University, Ithaca, NY, 14853, USA

Authors and Affiliations

Biology Department, American University, Washington, DC, 20016, USA
Deborah J. Weinstein, Sarah E. Allen, Kathryn C. Asalone, Kathryn Walters-Conte & John R. Bracht
Department of Geosciences, Princeton University, Princeton, NJ, 08544, USA
Maggie C. Y. Lau & Tullis C. Onstott
Laboratory of Extraterrestrial Ocean Systems (LEOS), Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, No. 28, Luhuitou Road, Sanya, 572000, Hainan Province, P.R. China
Maggie C. Y. Lau
UFS/TIA Saense Platform, Department of Microbial, Biochemical, and Food Biotechnology, University of the Free State, Bloemfontein, 9301, South Africa
Mariana Erasmus & Esta van Heerden
Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
Gintaras Deikus & Robert Sebra
Extreme Life Isyensya, Gentbrugge, 9050, Belgium
Gaetan Borgonie
Biology Department, Cornell University, Ithaca, NY, 14853, USA
Sarah E. Allen
North West University, Private Bag X6001, Potchefstroom, 2520, South Africa
Esta van Heerden

Authors

Deborah J. Weinstein
View author publications
You can also search for this author in PubMed Google Scholar
Sarah E. Allen
View author publications
You can also search for this author in PubMed Google Scholar
Maggie C. Y. Lau
View author publications
You can also search for this author in PubMed Google Scholar
Mariana Erasmus
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn C. Asalone
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn Walters-Conte
View author publications
You can also search for this author in PubMed Google Scholar
Gintaras Deikus
View author publications
You can also search for this author in PubMed Google Scholar
Robert Sebra
View author publications
You can also search for this author in PubMed Google Scholar
Gaetan Borgonie
View author publications
You can also search for this author in PubMed Google Scholar
Esta van Heerden
View author publications
You can also search for this author in PubMed Google Scholar
Tullis C. Onstott
View author publications
You can also search for this author in PubMed Google Scholar
John R. Bracht
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.J.W., S.E.A., and J.R.B. performed gene prediction and phylgenetic tree construction; D.J.W. and J.R.B. performed transcriptome analysis and wrote the paper, and J.R.B. performed genome assembly. M.C.Y.L. constructed the multilocus phylogenetic tree. M.E. and E.v.H. grew the nematodes for RNA. K.C.A. performed analysis of positive selection and K.W.C performed repetitive element analysis. G.D. and R.S. performed Illumina RNA and PacBio DNA sequencing. G.B. cultured the nematodes for DNA extractions and with T.C.O. provided critical feedback on the manuscript drafts and input into experimental design.

Corresponding author

Correspondence to John R. Bracht.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Weinstein, D.J., Allen, S.E., Lau, M.C.Y. et al. The genome of a subterrestrial nematode reveals adaptations to heat. Nat Commun 10, 5268 (2019). https://doi.org/10.1038/s41467-019-13245-8

Download citation

Received: 06 November 2018
Accepted: 24 October 2019
Published: 21 November 2019
DOI: https://doi.org/10.1038/s41467-019-13245-8

This article is cited by

Chromosome-level genome assembly of Bactrocera dorsalis reveals its adaptation and invasion mechanisms
- Fan Jiang
- Liang Liang
- Shuifang Zhu
Communications Biology (2022)
Gene family expansions in Antarctic winged midge as a strategy for adaptation to cold environments
- Heesoo Kim
- Han-Woo Kim
- Seung Chul Shin
Scientific Reports (2022)
Tokorhabditis n. gen. (Rhabditida, Rhabditidae), a comparative nematode model for extremophilic living
- Natsumi Kanzaki
- Tatsuya Yamashita
- Ryoji Shinya
Scientific Reports (2021)
Speciation and adaptive evolution reshape antioxidant enzymatic system diversity across the phylum Nematoda
- Lian Xu
- Jian Yang
- Dongjuan Yuan
BMC Biology (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Assembly, annotation, and phylogenetic analysis

Hsp70 expansion

AIG1 expansion

Genes differentially expressed under heat stress

Discussion

Methods

H. mephisto culture and isolation of DNA and RNA

Genomic DNA sequencing

RNA sequencing

Genome assembly

Heterozygosity

Analysis of within-family nonself blastn matches

Analysis of coverage

Analysis of repetitive sequence

Gene discovery

Gene expression

Analysis of unknown genes

Venn diagram

Domain comparisons

Multilocus phylogenetic tree

GO analysis

Hsp70 and AIG1 tree building

PAML with branch-sites model

Analysis of subtelomeric sequences

Reporting summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links