The nematode, Caenorhabditis elegans, has been a core model organism that is used in a wide range of biological and medical studies for the last several decades and has led to a number of key discoveries, including the molecular mechanisms of apoptosis1 and gene silencing by small RNAs2. Recently, C. elegans and other species of the genus Caenorhabditis have been developed as a useful model system for a wide range of evolutionary studies3,4.

Currently, the genus Caenorhabditis contains 49 nominal species5,6,7,8. However, because of the morphological similarity among close relatives and morphological divergence within each species, the species status is often delimited on molecular barcoding and/or hybridization analyses4, and several species have been described only based on molecular phylogenetic status and mating studies6. Meanwhile, some previously described species, including C. anthobia, C. avicola, C. clavopapillata, and C. craspedocercus, were characterized based solely on their morphological traits (Table 1, Supplementary Text). For those species, re-isolation followed by molecular profiling and/or re-characterization of typological characteristics is demanded.

Table 1 List of nominal and characterized Caenorhabditis species including type locality, and type host or habitat.

Caenorhabditis auriculariae was initially described morphologically as an associate of the fruiting bodies of the basidiomycota fungus Auricularia polytricha9. Based on morphological characteristics, the species is relatively easily distinguished from other congeners9. For example, the species has a short stoma, bifurcated metastegostomatal teeth, a spicule structure similar to the drosophilae super group, and closed bursa similar to several species in the drosophilae and elegans super group species. Thus, this species seems to be rather basal in the genus, but the phylogenetic status of the species remained undetermined.

In the present study, using re-isolated materials, we examined C. auriculariae morphological characteristics in details, and defined the basic molecular profiles based on nuclear ribosomal sequences. Additionally, we sequenced the whole genome of the species and revealed its basic genomic features. These results revealed the C. auriculariae’s basal phylogenetic position in the genus and its usefulness as an outgroup to understand the Caenorhabditis evolution.


Morphological characteristics

Caenorhabditis auriculariae is a rare, neglected species which was reported only once about 20 years ago9. We isolated C. auriculariae from a mushroom beetle, Platydema sp. feeding on the fruiting body of Auricularia polytricha in Aichi, Japan. The general morphological characteristics are as described previously9 and those are photo-documented and illustrated in Fig. 1 and Supplementary Figs. S1S7. Several typological taxonomic/phylogenetic key characteristics in the male tail are explained and discussed below. Only newly found characteristics are described here to avoid redundancy.

Figure 1
figure 1

Stomatal structure of Caenorhabditis auriculariae. (A) En face view where lavial censilla (asterisk), cephalic sensilla (c), amphids (a) and cheilostomatal flap baring on dorsal (d), right subventral (r) and left subventral (l) sectors are suggested; (B) left lateral view; (C) ventral view. Cheilostom (c), gymnostom (g) and stegostom (s) are suggested in (B,C).

The detailed lip and cheilostomatal structures are described for the first time as follows (Fig. 1). Lip is separated into six lip sectors, and each has an outer labial papilla. There, three pairs of neighboring (right subventral + latelal, left subventral + lateral and right and left dorsal) lip sectors are partially fused to form three large lip sectors. Thus, these three sectors are arranged triradially in en face view. The short tube-like stoma is separated into three elements from anterior: cheilostom, gymnostom and stegostom. Cheilostom cuticular tube, occupies about 35–40% of total stomatal length. Anterior part of cheilostomatal wall (cheilorhabdion) extends and internally fold to form a half-circle shaped flap which covers stomatal opening like valve apparatus. Posterior end of chaeilostom overlapping gymnostom. Gymnostom is simple and short cuticular tube, occupies about 20–25% of stoma. Stegostom is separated from the other parts of stomatal element by possession of pharyngeal sleeve consisting of four subelements: pro-, meso-, meta- and telostogostom. Pro- and mesostegostom is not clearly separated and forms a simple cuticular tube. Metastegostom at the posterior end of pro-mesostegostomatal tube, forming three (two subventral and a dorsal) small bifid teeth. Telostegostom not cuticularized, connecting stoma and pharynx.

Phylogenetic status

The phylogenetic relationships of 28 Caenorhabditis species and one outgroup species (Prodontorhabditis wirthi) inferred from the full length SSU rRNA and D2-D3 regions of LSU rRNA were congruent with previously provided phylogenetic trees, except for a few terminal nodes and the placement of C. plicata, which was as part of the drosophilae supergroup with low posterior-probability support4,6. C. auriculariae was placed close to Caenorhabditis sonorae, which was isolated from the rotten cactus Carnegiea gigantea in the USA10, and Caenorhabditis monodelphis, isolated from the galleries of the fungal-feeding beetle, Cis nitidus inside fruiting bodies of Ganoderma applanatum3,4. These three species formed an independent clade at the basal (outgroup) position of the other Caenorhabditis spp. (Fig. 2A). The GenBank accession numbers of the sequences compared are listed in Table S1.

Figure 2
figure 2

Phylogenetic status of Caenorhabditis auriculariae. (A) The combined Bayesian tree inferred from near-full-length SSU and D2-D3 LSU. The substitution model and parameters for SSU and D2-D3 are GTR + I + G (AIC = 16,046.5234; lnL = 8013.261; freq A = 0.2452; freq C = 0.2064; freq G = 0.2661; freq T = 0.2823; R(a) = 1.4800; R(b) = 3.4097; R(c) = 1.9899; R(d) = 0.5864; R(e) = 5.6910; R(f) = 1; Pinva = 0.4188; Shape = 0.4851) and GTR + I + G (AIC = 9579.5820; lnL = 4779.7910; freq A = 0.2163; freq C = 0.2034; freq G = 0.3165; freq T = 0.2638; R(a) = 0.7283; R(b) = 2.7572; R(c) = 1.3283; R(d) = 0.4052; R(e) = 5.6136; R(f) = 1; Pinva = 0.2046; Shape = 0.4580), respectively. Posterior probability values exceeding 50% are given for the appropriate clades. (B) Maximum Likehood Phylogeny inferred using a total of 299 one-to-one single copy orthologous by IQTREE v2 with 1000 bootstrap values under LG + F + R5 substitution model.

Voucher material

Ten male and 10 female C. auriculariae adults have been vouchered as permanent slides at the Forest Pathology Laboratory collection of the Forestry and Forest Products Research Institute with the material numbers Caenorhabditis auriculariae M01–10 and F01–10. The TAF-fixed materials and unmounted glycerol-processed materials have also been vouchered into the collection. Live cultures and a frozen stock of C. auriculariae has been deposited in Taisei Kikuchi’s Lab. (culture code NKZ352; Miyazaki University, Miyazaki, Japan), and further genomic and transcriptomic analyses will be conducted.

Genome characteristics of C. auriculariae

For a deeper understanding of the phylogenetic status and biological features of C. auriculariae, we sequenced the genome of the species and conducted a genome comparison with other Caenorhabditis species. The hybrid assembly using Nanopore long reads and Illumina short reads (Table S2) resulted in a 109.5 Mb assembly composed of 491 scaffolds with high completeness values (89.8% BUSCO and 95.9/99.6% partial/complete CEGMA) (Table 2). A total of 16,279 protein-coding genes with a mean protein length of 435.34 and the largest of 8,188 amino acids were predicted on the genome assembly. This genome size is ~ 10% larger, but the predicted gene number is slightly smaller than those in C. elegans. The C. auriculariae genome contained of 20.8 Mb repetitive sequences that account for 19.04% of the genome, which is similar amount as C. elegans genome (18.45%) (Table 3). DNA repeat family was the most abundant (1.6%) followed by LINE family (0.91%) and LTR family (0.51%), though a large portion of C. auriculariae repeats (15.24%) were classified as “Unclassified”. Compared to C. elegans, more retroelements (LINEs and LTRs) were identified in C. auriculariae, which is consistent with the fact that the C. auriculariae gene models contained a higher number of transposon genes (Table 2) and RVT (reverse transcriptase) domains (see Pfam result below).

Table 2 Genome and gene model statistics for C. auriculariae and comparisons of other Caenorhabditis species.
Table 3 Repeat contents in C. auriculariae and C. elegans.

We then performed a phylogenomic analysis using 35 Caenorhabditis species whose draft genome sequences were available with D. pachys as an outgroup. A ML tree based on 97 single-copy genes showed a mostly consistent topology to the nuclear rRNA tree; species of elegans group and japonica group each formed a separated cluster, with species of drosophila supergroup located at more basal posidion of the tree. C. auriculariae was placed at the most basal position of Caenorhabditis genus with C. monodelphis (Fig. 2B). C. parvicauda, which has a morphological novelty, secondary loss of bursa, and is considered highly divergent11, shows a long branch in the tree but belongs to the inner clade.

The two basal species, C. auriculariae and C. monodelphis, showed similar genome statistics to each other. For instance, C. auriculariae/C. monodelphis total assembly size are 109.5/115.1 Mb and 16,279/17,180 in the predicted gene numbers (Table 2). The gene structures of C. auriculariae are also similar to those of C. monodelphis, in which genes are generally longer, contain more exons, and a longer span of introns than C. elegans genes (Fig. 3), which was suggested to reflect an ancestral status of Caenorhabditis genome structure12.

Figure 3
figure 3

Comparison of gene structure in single-copy orthologues between C. auriculariae, C. elegans and C. monodelphis. Whereas total CDS length per gene is similar in the three species (A), C. auriculariae and C. monodelphis have larger CDS counts (B) and longer intron span per gene (C) than C. elegans. CAUJ; C. auriculariae, CMON; C. monodelphis, CELE; C. elegans.

Comparison of protein domain (Pfam) distribution patterns in the genomes revealed that C. auriculariae, compared to C. elegans, has higher numbers of Ank, LRR, HEAT, TIL, HTH_Tnp_Tc3_2, DDE_3, RVT_1, and DEAD protein domains. The numbers of protein domains related to receptors (GPCRs, Hormone_recep, and Recep_L_domain), WD40, Collagen, Ig_3, I-set, V-set, Pkinases, EGF, Zinc finger, Shk, C2-set_2, FTH, FBA_2, Lectin_C domains are smaller in C. auriculariae (Fig. 4). Gene family (orthologue) analysis assigned a total of 389,541 genes (90.8%) of 18 Caenorhabditis species and D. pachys into 31,748 orthogroups. Of 31,748 orthogroups, 4971 orthogroups were shared by all species. A high number of orthogroups (9737 orthogroups) are shared by C. auriculariae and C. monodelphis with 356 unique to the clade. However, the two species still exhibit high numbers of species-specific orthogroups: 2546 and 3880 orthogroups unique to C. auriculariae and C. monodelphis, respectively (Fig. 5). C. auriculariae specific-orthologous include genes encoding proteins with Ank (Ankyrin), TIL (Trypsin Inhibitor like cysteine rich domain), LEA_4 (Late embryogenesis abundant), GPCR (G protein-coupled receptors), Collagen, Pkinase (Protein kinase) and Apolipoprotein domains (Table S3), suggesting that genes of those functions are highly diverged in C. auriculariae and possibly reflecting its unique lifestyle though it has not been revealed yet.

Figure 4
figure 4

Pfam domain abundance in C. auriculariae and C. elegans. The x-axis represents abundance of Pfam domains in C. elegans and the y-axis represents abundance of the same domains in C. auriculariae. Linear regression is plotted alongside their respective equations and correlation coefficients. PFAM domains enriched were labelled with the domain names.

Figure 5
figure 5

Orthologue comparison across the 18 Caenorhabditis and D. pachys genomes. UpSetR plot showing unique and overlapping protein ortholog clusters. The intersection matrix is sorted in descending order. Blue bars represent the orthogroup size for each genome and connected dots represent intersections of overlapping orthogroups while vertical bars show the size of each intersection. Orthogroups unique to C. auriculariae and C. auriculariae-C. monodelphis are shown in red and green, respectively.

Carbohydrate-Active enzymes (CAZy) are involved in several biological processes, including feeding, energy metabolism, structural support, and signal transduction13. The repertories in the genome generally reflects its life style. We identified a total of 312 CAZy genes (5 auxiliary activities (AA), 32 carbohydrate-binding modules (CBM), 47 carbohydrate esterases (CE), 71 glycoside hydrolases (GH) and 157 glycosyltransferases (GT)) in C. auriculariae, which is a comparable number with other Caenorhabditis species (Table S4). We found many CAZy classes are common to the 35 Caenorhabditis species (e.g. two AA, seven CBM, four CE, 18 GH and 30 GT) though the number of genes in each class varies, but some are species or group specific (e.g. GH131 in C. brenneri and GH88 in C. guadeloupensis). To reduce the complexity of the CAZy distribution patterns across 35 Caenorhabditis species, we conducted a principal component analysis (PCA). The first two principal components explained 73.4% of the overall variance (55.9% and 17.5%for principal component 1 and 2, respectively) (Fig. 6). The PCA plot (Fig. 6) clustered species largely by the taxonomic groups; species of elegans-group were mostly located upper right, most of japonica- and drosophilae-groups were placed lower middle, and the basal group was on the upper left. However, interestingly, this CAZy-based plot seems also highly correlated with particular lifestyles. For instance, C. inopinata, C. japonica, C. drosophilae and C. bovis were placed close to each other although they belong to elegans-group, japonica-group, the drosophilae-supergroup, and a separate basal clade, respectively. These four nematodes are well-known insect-associates as using insects as distributing vectors14,15,16. C. angaria and C. castelli are phylogenetically close to each other, but they were clearly separated by PC1 and PC2. Similarly, C. angaria has a tendency to ride weevils17 whereas there are no reports about an insect association for C. castelli. We have tested if there is a relationship between the trait (insect-association) and the CAZy distribution using the phylogenetic logistic regression with the PC values and found a significant correlation between PC1 and the trait (p < 0.01) (Fig. 6). It is also interesting to note that the hermaphroditic species (i.e., C. elegans, C. briggsae and C. tropicalis) were placed together in the PCA plot although those hermaphrodism were evolved independently in the Caenorhabditis evolutionary history although the regression test was not statistically significant (Fig. 6).

Figure 6
figure 6

Principle component analysis (PCA) score plot of Carbohydrate active enzyme (CAZy) distribution. The first two axes explain 55.9% and 17.5% of total variance, respectively. Taxonomic groups and ecological traits (hermaphrodism and insect association) are indicated by point shapes and colours, respectively. The ellipse in the plot illustrates the 95% prediction region. The best models of phylogenetic logistic regression for the ecological traits with PCs were shown in box.

In the PCA plot, C. auriculariae was placed together with C. monodelphis on the top left, suggesting those basal species have similar lifestyles to each other and they possibly have tight associations with insects, which is consistent with the fact that they were isolated from beetles.


Morphological comparison with other molecularly characterized Caenorhabditis spp

Caenorhabditis auriculariae was described in 1999 before the deep-level phylogeny of the genus or the relationship between morphological characteristics and phylogenetic status had been examined9. Later, Kiontke et al.4 examined nominal and undescribed Caenorhabditis spp. using multiple molecular loci and coded their typological characteristics in a phylogenetic analysis.

The genus Caenorhabditis was separated into two supergroups (elegans and drosophila supergroups) based on molecular phylogenetic analyses and male tail characteristics. In addition, there were several species that do not fall into those supergroups4,6 which tentatively regarded as basal group. Basal group species including C. auriculariae harbour some typical characteristics from both supergroups that are hypothesized to be the stem species pattern, namely: (1) oval and anteriorly opened bursa without serratae and terminal notch on the edge of the velum, (2) nine pairs of bursal rays in which p2 reaches to the edge of the velum, p2 and p3 are clearly separate, and p1 is directed dorsally, (3) precloacal lip is rounded, (4) spicule with a slightly ventrally bent blade and complex tip, and 5) parallel mating position4, although several species-specific apomorphies, e.g., the secondary loss of bursa in C. parvicauda11, has been reported. As are in other basal group species, several species-specific apomorphies (or clade, if there is a closely related cryptic species) are evident after comparing the morphological and molecular phylogenetic status of C. auriculariae. The typological characteristics of the C. auriculariae male tail are (1) wide, heart-shaped bursa with an anterior serrated-edge velum and no terminal notch, (2) nine pairs of bursal rays arranged as (p1d, p2)/P3, (p4 + p5d), p6, (p7m p8d), (ph, p9), where p2 does not reach to the edge of the velum and p2 and p3 are clearly separate, (3) precloacal lip forms a heart-shaped or bifid cap structure, (4) stout and evenly curved spicule with a complex spicule tip (possessing a dorsally oriented small projection at the distal tip), and (5) parallel mating position (not spiral).

Caenorhabditis auriculariae spicule morphology is similar to that of several drosophilae supergroup species (C. drosophilae, C. angaria, C. castelli, and Caenorhabditis sp. 2 and sp. 8) and C. monodelphis4,6,10,17. The bursal velum morphology is similar to all elegans supergroup species and several drosophilae supergroup species (C. portoensis, C. virilis, and C. latens)4,6. The arrangement of bursal rays is somewhat intermediate between the two supergroups. For example, the short p2 that does not reach the edge of the velum is similar to the elegans supergroup and C. virilis; dorsally directed p5d is shared with the elegans supergroup and C. monodelphis, clearly separate p2 and p3 are shared with all non-elegans group species, and dorsally directed p8d is shared with two drosophilae supergroup species (C. portoensis and C. virilis)4,6. Additionally, the parallel mating position is similar to all known Caenorhabditis except three drosophilae supergroup species (C. angaria, C. castelli, and Caenorhabditis sp. 8)4,6. The heart-shaped cap on the precloacal lip and the arrangement of the bursal rays (see above) are unique to C. auriculariae. In addition, the stomatal morphology of C. auriculariae is unique. Therefore, regardless of the unique characteristics, none of the nominal (and characterized) species exactly matched the typological characteristics of C. auriculariae. C. auriculariae is distinguished from all other phylogenetically characterized Caenorhabditis sp. based solely on typological characteristics.

The rRNA and genome-based phylogeny suggested the closeness of C. auriculariae with C. sonorae, and C. monodelphis, as these three species formed a well-supported independent clade at the basal position of the genus. They, however, can be clearly separated by their typological characteristics, as the bursal velum, ray characteristics, and precloacal lip structure differ from each other4,10. In addition, C. auriculariae has a quite unique stomatal morphology, with a long flap-like cuticular extension on the cheilostom and three bifid metastegostomatal teeth. Although the stomatal characteristics of C. monodelphis have not been described in detail, both species have a long and narrow stoma, and C. sonorae has a three triangular teeth, which is common in the genus, but C. monodelphis does not have glottoid apparatus5,10. The unique stomatal structure of C. auriculariae could be a species (or clade) specific apomorphy.

Biological features and genome

Caenorhabditis nematodes have been isolated from many different environments and animals, such as rotting fruit3,4,10, rich soil and manure18,19,20,21, mushrooms3,9, insects14,17,22, soil and freshwater invertebrates22,23, and vertebrates potentially including humans24,25,26. Some vertebrate associations could be due to insect carriers associated with the “host” vertebrates. C. monodelphis and C. auriculariae were originally isolated from G. applanatum in Berlin, Germany and from A. polytricha in Kyoto, Japan, respectively, and are associated with fungal-feeding beetles3,9. In the present study, C. auriculariae was isolated from a fungal beetle, Platydema sp. Although the detailed carrier association, e.g., the beetle species is primary carrier of C. auriculariae, beetle body organ harbouring the nematode, and number and association rate of nematode in individual beetles, was not clarified in this study, at least the ability of insect association (phoresy) was confirmed for C. auriculariae. Because of the limited data, we cannot conclude that the fungal (mushroom) associations of these species are related to clade-specific habitat preferences or carrier insects. However, the present results will be useful to further isolate strains of those species. Diplogastrids nematodes, Pristionchus spp. were considered as the soil-inhabiting free-living nematodes for long time, and its close insect association has been confirmed recently27,28. The close rotten fruits-association of Caenorhabditis spp. has not been recognized for a long time4. After findings of these associations, the number of new species isolation increased dramatically3,4,29. Similarly, this study and recent reports on insect-associated Caenorhabditis spp. should enhance new species identification of the genus by surveys of nematodes around insects.

The genome comparison revealed the presence of highly diverged or unique genes encoding GPCRs in C. auriculariae. GPCRs work as primary receptors to detect a wide variety of environmental signals and are therefore highly diverged in organisms or even among individuals30,31. The unique repertoires of C. auriculariae GPCRs probably reflect the need to detect environmental signals specific for its lifestyle, such as mushroom and insect associations. The genome comparison also found diverged LEA proteins in C. auriculariae. LEA proteins were initially discovered accumulating late in embryogenesis of cotton seeds and later shown to have a role to protect proteins against aggregation due to desiccation or osmotic stresses in some plants, bacteria and invertebrates32,33. Further functional investigation is necessary, but this may reflect its life-cycle in which the nematode encounters relatively dry condition compared to C. elegans.

CAZy distribution-based PCs separated insect associated species from non- or less- associates regardless of their phylogenetic relationships. Furthermore, this method roughly separated hermaphroditic species from gonochoristic even when two closely related sister species have contrastive reproduction modes. Therefore, this method can be of particular usefulness to speculate on the lifestyle of newly isolated species with non-detailed ecological information. For example, based on the fact that C. pamanensis was placed in the insect-associate ellipse (Fig. 6), we could speculate that the worm has a lifestyle with a tight insect-association, although no such records were reported (Table 1). Indeed, there are several rare Caenorhabditis species with unclear ecological status, such as C. yunguensis (Table 1).

This study provided a high-quality genome reference for C. auriculariae. A genome of C. monodelphis was recently published as an outgroup reference for Caenorhabditis12. C. auriculariae is also phylogenetically placed at the basal position of the genus and shared several genome features with C. monodelphis. However, the distance of the two species is substantially long, and each genome contained a number of species-specific genes. Therefore, C. auriculariae, together with C. monodelphis, provides a powerful resource to perform deep evolutionary studies in the genus Caenorhabditis.


Nematode materials

Potential carrier insects of nematodes were collected in the field in Nagoya, Aichi, Japan on 17 June 2015. The samples were collected under an official permit from the Nagoya City local governmental office. Several species of coleopteran insects (beetles) were collected, brought back to the laboratory, morphologically identified, and dissected to examine their association with nematodes. The dissected insect bodies were placed in 2.0% water agar to allow propagation of phoretic microbe-feeding species and examined occasionally. No endangered or protected species were collected in the present study.

A Caenorhabditis sp. was isolated from the dissected body of Platydema sp. (Coleoptera: Tenebrionidae); the nematode was not confirmed during the dissection but propagated on the dissected body of its carrier beetle. The nematode was observed under a light microscope (Eclipse 80i: Nikon, Tokyo, Japan) to determine its feeding habits. It was then transferred to nematode growth medium (NGM) and kept as a laboratory strain with culture code NKZ352.

Morphological observations and micrographs

Live and TAF-fixed C. auriculariae material from 2-week-old cultures was observed under a light microscope using the methodologies defined by Kanzaki34. The nematode were identified to species based on typological characteristics when compared with the original description9. Thereafter, the TAF-fixed material was processed into glycerin according to a modified Seinhorst’s method35 and deposited as morphological vouchers.

Several morphological characteristics that were not provided in the original description, e.g., detailed stomatal structure, were drawn using a drawing tube, and other general characteristics were photo-documented using a digital camera system (DS-Ri1, Nikon) connected to a microscope.

Scanning electron microscope (SEM) observation

For SEM observation, adult nematodes were treated with the pre-fixation solution (2% paraformaldehyde, 2.5% glutaraldehyde, 0.1 M Cacodylate, pH 7.4) for 2 h at 4 °C followed by incubation in the fixation solution (1% OsO4, 0.1 M Cacodylate, pH 7.4) for 1 h at 4 °C. Samples were then dehydrated in ethanol (50% to 100%, gradually). They were substituted by isoamyl acetate and were dried by using a freeze-drying device (Eiko ID-2). Dried nematodes were coated with Platinum by using ION SPUTTER (HITACHI E-1045) and were observed by using SEM (Hitachi S-4800) operating at 20 kV.

Molecular profiles and preliminary phylogenetic analyses

Prior to genome wide phylogenetic analysis of selected species, the phylogenetic status of C. auriculariae within the genus was analysed based on the ribosomal RNA genes. Nematode lysate material was prepared for use as a polymerase chain reaction (PCR) template according to the protocol developed by Kikuchi et al.36 and Tanaka et al.37. The molecular sequences of small subunit ribosomal RNA (SSU rRNA) and D2-D3 regions of large subunit ribosomal RNA (LSU rRNA) were sequenced with the PCR direct sequencing methods developed by Ye et al.38 and Kanzaki and Futai39.

A Bayesian molecular phylogenetic analysis was conducted based on SSU and D2-D3 LSU as previously described40. The sequences were aligned using MAFFT41 and the base substitution model was determined using Modeltest ver. 3.742 under the Akaike information criterion model selection criterion. Then, a Bayesian analysis was performed to infer the tree topology of each gene using MrBayes 3.243; four chains were run for 4 × 106 generations. Markov chains were sampled at intervals of 100 generations44. Two independent runs were performed, and the remaining topologies were used to generate a 50% majority-rule consensus tree after confirming convergence of runs and discarding the first 2 × 106 generations as burn-in.

DNA/RNA isolation and sequencing

For whole genome analyses, nematodes were propagated on NGM plates implemented with E. coli Op50 strain. After 2 weeks of incubation at 20 °C, nematodes were collected from the plate, washed five times with M9 buffer and the genomic DNA was extracted using Genomic-tip (Qiagen) following the manufacturer’s protocol. Paired-end and Mate-pair sequencing libraries were prepared using the Nextera DNA Sample Prep kit (Illumina) and TruSeq DNA Library Preparation kit, respectively, according to the manufacturer’s instructions and sequenced using Illumina MiSeq sequencer with the v3 kit (301 cycles × 2 or 76 cycles × 2) (Illumina) (Supplementary Table S2).

Two μg of genomic DNA was used to prepare Nanopore sequencing library using the Ligation Sequencing Kit SQK-LSK109 (Oxford Nanopore Technologies) according to the manufacturer’s protocol. The library was sequenced with a single 24 h run with FLO-MIN106 R9 MinION flowcell (Oxford Nanopore Technologies). Base calling for R9 runs was performed with Guppy v.3.1.5 using the ‘dna_r9.4.1_450bps_fast’ model and obtained 771,594 reads (~ 3 Gb) (Supplementary Table S2).

For mRNA-seq analysis, RNA was extracted from fresh mixed-stage nematodes using TRI reagent according to the manufacturer’s instructions. Total RNA samples were qualified using Bioanalyzer 2100 (Agilent Technology, Inc.) and only samples with an RNA integrity value (RIN) greater than 8.0 were used for library constructions. One hundred ng of total RNA was used to produce an Illumina sequencing library using the TruSeq RNA-seq Sample Prep kit according to the manufacturer's recommended protocols (Illumina). The RNA libraries were sequenced using Illumina MiSeq sequencer with the v3 kit (301 cycles × 2) (Illumina) (Supplementary Table S2).

Genome assembly

Three de novo assemblers were used to generate initial assemblies. The Nanopore reads (~ 771 K reads, N50 = 4.7 kb) were assembled with Flye (v.2.7.1)45 in raw nanopore mode using -g 100 m or Canu46 using genomeSize = 100 m, both followed by base correction by Illumina DNA reads using Pilon (v.1.22)47. Spades (v.3.7.1)48 was separately used to generate a hybrid assembly of Nanopore, Illumina pair-end and mate-pair reads (Supplementary Table S1) with the default options after trimming of Illumina reads for after trimming for low quality and adaptor contamination using Trimmomatic (v.0.32)49,50. The three assemblies were merged using MetaAssembler (v.1.5)51 with the Flye assembly as a reference. Haplomerger2 (20151106)52 was run on the merged assembly to remove remaining haplotypic sequences. Further base corrections were performed by ICORN253 using ~ 5G base of the Illumina pair-end reads. Contigs derived from bacteria or other organisms contaminations were identified and removed from the assembly using Blobtools54 and BlastN search against NCBI bacterial nt database. CEGMA v255 were used to assess the completeness of the assemblies.

Gene prediction

RNA-seq read pairs were aligned to the C. auriculariae assembly using Hisat2 v2.1.056 with default parameters and used to generate intron hints using bam2hints script in Augustus v3.3.257. Protein-coding genes on the assembly were predicted using BRAKER258 with the intron hints and protein homology hints from ~ 78,000 proteins of 9 nematode species (Brugia malayi, Bursaphelenchus xylophilus, Caenorhabditis elegans, C. briggsae, Necator americanus, Pristionchus pacificus, Strongyloides ratti, Trichinella spiralis, and Trichuris muris).Protein domain annotations were performed on the gene models using Pfam search (ver. 28.0)59 with HMMER v3.1b260 with e-value cutoff (1e−5).

Carbohydrate-active enzyme analysis

Carbohydrate-active enzyme (CAZy) were detected using CAZy database61 and HMMER v3.1b2 under e-value cutoff (1e−5). Possible contaminations of bacteria or fungi were removed from the detected CAZy genes using BlastP search results against NCBI nr database. CAZy genes of each species were then counted for auxiliary activities, carbohydrate-binding modules, carbohydrate esterases, glycoside hydrolases, polysaccharide lyases and glycosyltransferases, separately.

Principal component analysis was performed for CAZy distribution of 35 Caenorhabditis species using the prcomp function and the results were visualised by Factoextra package62 both implemented in R ( Phylogenetic logistic regressions for ecological traits (reproduction modes or insect-associations) were performed with the Phylolm R package63 using the principal component values (PC1 to PC4) as explanatory variables and the tree shown in Fig. 2B as phylogenetic information under the logistic_IG10 method and the best models were selected by Akaike's entropy-based Information Criterion (AIC).

Orthologous relationship of C. auriculariae with other Caenorhabditis species and constructing phylogenetic tree

Orthologous analysis of C. auriculariae with 17 selected Caenorhabditis species and Diploscapter pachys as an outgroup was performed using OrthoFinder v2.3.1164 with default parameters using the longest isoform set of each species. Orthologous distribution among species was visualised using the UpSetR R package65.

For a genome-wide phylogenetic analysis, amino acid sequences of 96 single-copy orthologous in 37 species were aligned using MAFFT v7.22141 with auto options. Poorly aligned regions were removed using Gblocks v0.91b66 with the parameters (-t = p, -b4 = 10, -b5 = n, -b6 = y, -s = y, -p = y, -e = -gb). The alignments were concatenated and used to generate a maximum-likehood tree using RAxML v8.0.2667. For the RAxML analysis, alignments were partitioned by gene with the PROTGAMMAAUTO model (the best-fitting model for each gene) used for all partitions. The topological robustness was assessed with 100 replicates of fast bootstrapping. Resulting phylogenetic tree was visualized in FigTree v1.4.468.