Biology and genome of a newly discovered sibling species of Caenorhabditis elegans

A ‘sibling’ species of the model organism Caenorhabditis elegans has long been sought for use in comparative analyses that would enable deep evolutionary interpretations of biological phenomena. Here, we describe the first sibling species of C. elegans, C. inopinata n. sp., isolated from fig syconia in Okinawa, Japan. We investigate the morphology, developmental processes and behaviour of C. inopinata, which differ significantly from those of C. elegans. The 123-Mb C. inopinata genome was sequenced and assembled into six nuclear chromosomes, allowing delineation of Caenorhabditis genome evolution and revealing unique characteristics, such as highly expanded transposable elements that might have contributed to the genome evolution of C. inopinata. In addition, C. inopinata exhibits massive gene losses in chemoreceptor gene families, which could be correlated with its limited habitat area. We have developed genetic and molecular techniques for C. inopinata; thus C. inopinata provides an exciting new platform for comparative evolutionary studies.

6 1/5 of total stoma, posteriorly overlapping with the anterior end of gymnostom. Gymnostom simple tube-like, ca. twice as long as cheilostom, i.e., occupying ca. 1/3 of total stoma or a little more, weakly separated into two subsections, but the separation is difficult to observe in microscopic observation, each subsection is hypothesized to be associated with an arcade synsitium. Stegostom covered by pharyngeal sleeve and separable from gymnostom, separated into four subsections, pro-, meso, meta-and telostegostom. Pro-and mesostegostom not clearly separated, fused to form a simple tube which is almost same as gymnostom in length. Metastegostom forming three flap-like teeth, one on each dorsal, right and left subventral sector, the outer part weakly sclerotized to form a ring surrounding the posterior part of metastegostom. Telostegostom without clear armature, forming small funnel connecting stoma and pharynx (procorpus). Pharynx separated into four sections, procorpus, metacorpus (median bulb), isthmus and basal bulb. Pro-and meta corpus forming muscular anterior pharynx and the other two sections form glandular posterior pharynx. Procorpus muscular tube occupying ca. 60% or a little more of corresponding body diam. Metacorpus forming muscular median bulb without clear valve or glottoid apparatus. Isthmus narrow, not muscular. Basal bulb well-developed with double haustrulum as the glottoid apparatus (grinder-like structure typical to rhabditid nematodes). Pharyngo-intestinal valve (cardia) prominent. Nerve ring around the middle of isthmus. Excretory pore located around the margin of isthmus and basal bulb, perpendicular to body surface and possessing tube-like excretory-secretory duct. A large cell visible on ventral side, at level of, or a little posterior to, cardia, assumed to be a secretory cell associated with secretory-excretory system. Female. Body straight or slightly ventrally arcuate when killed by heat. Gonadal system didelphic, amphidelphic. Anterior and posterior gonadal system on the right and left of intestine, respectively, and basically symmetric with each other, thus anterior gonad is described from distal part to vulva/vagina. The gonadal system arranged as ovary, oviduct, spermatheca, spermathecal-uterus junction tissue, uterus and vulva/vagina from distal (anterior). Distal part of ovary reflexed dorsally, and often tangled under cultured condition; oocytes arranged in multiple (2-5) rows in the reflexed part, and well-developed oocytes arranged in single row near oviduct. Oviduct short composed with small and rounded cells connecting ovary and spermatheca. Spermatheca composed of large and squared cells forming roundish rectangular-shaped sac. Spermatheca-uterus junction distinctive; a band of long cells with a band of thread-like appearance is surrounded by small and rounded cells.
Uterus well-developed often containing many, sometimes more than 20 developing eggs.

7
Vagina perpendicular to body surface constricted by sphincter muscle at the uterus-vagina junction. Vulva horizontal slit with vulval lips slightly protruding. Tail conical or forming slightly elongated conus with pointed tip. Anus and rectum clearly visible; the intestine-rectum junction constricted with sphincter muscle and surrounded by three (two subventral and one dorsal) rectal glands. Anal opening dome-shaped slit in ventral view. Phasmid forming small pore located laterally at ca. 60% of total tail length from anus.
Male. Testis single-armed on the right subventral of intestine, anteriorly reflexed rightwardly.
Spermatocytes arranged in three to four rows in the reflexed part; well-developed spermatocytes in the two to three rows in the middle part; mature spermatids tightly packed in the rest of testis. Vas deferens occupying ca. 1/5 of total gonadal length, composed of large cells, fused with the intestine (rectum) in its posterior end (at the level of spicule) to form a narrow cloacal tube. Tail enveloped by a closed bursa, supported by nine pairs of genital papillae (bursal rays). Anterior cloacal lip with a rounded and sclerotized appendage and bulge-like appendage between rounded appendage and cloacal opening; a small sensilla-like papilla on the bulge-like appendage. Posterior cloacal lip with tongue-like appendage with two cloacal sensilla. Spicules paired, separate, long and slender with evenly slightly ventrally curved blade (calomus-lamina complex) and simply pointed tip.
Gubernaculum slender, ventrally arcuate with small squared appendage at the distal end in lateral view; forming spindle shape with outwardly pointed appendage in ventral view. Bursa heart-shaped in ventral view, anteriorly closed with serrated edge; serratae obvious in anterior half and vague in posterior half; terminal notch present but unclear. The nine pairs of genital papillae or bursal rays supporting the bursal velum with an arranged (2/1+1+2+3), i.e., first and second rays (r1 and r2) anterior to cloacal opening, close (stuck) to each other, third ray adcloacal, forth ray slightly posterior to r3, fifth and sixth rays close (stuck) to each other and slightly posterior to r4, seventh to ninth rays (r7-9) grouped at the middle between r6 and tail tip. The r1, r6 and r9 almost reach the edge of velum, others open dorsally or ventrally in the velum. The r1, r5 and r7 opening dorsally forming papilla-form tip, r2, r3, r4, r8 and r9 opening ventrally forming papilla-form tip. The r6 with tube-like dorsal opening, expanded root, and the ray forming bowling pin-shape. Phasmids sensilla-like, around the root of r9. Supplementary

Mating incompatibilities
To confirm that C. inopinata and C. elegans are reproductively isolated, multiple replicate crosses between the two species were made (Supplementary Table 2). All crosses were performed at 23-25°C using 2 or 5 males and 2 or 3 females (or hermaphrodites); parents were transferred daily. In addition, mating behaviour observations were conducted on limited 2-5 min observation periods for 2-3 times per day until the females died.
For C. elegans males crossed to C. inopinata females, we observed mating behaviour (scanning for vulva, stereotypical turning behaviour by the males). However, females laid no embryos and received no sperm for all 16 females examined. For the reciprocal cross direction, C. inopinata males were tested against three hermaphrodite strains: wild-type N2, dpy-5, and unc-119. All three crosses produced abundant self progeny broods indicating that C. inopinata males do not fertilise C. elegans. We also crossed C. inopinata males to fog-2 (q71) pseudo-females 11 . Although we observed male mating behaviour, no embryos were produced and no sperm was transferred (n=21 pseudo-females). In contrast to the lack of cross progeny between the species, control con-specific crosses produced viable F1 and F2 progeny, as expected. Together these results demonstrate reproductive isolation between C. inopinata and C. elegans.
We also tested 5 other Caenorhabditis species (C. briggsae, C. brenneri, C. remanei, C. guadeloupensis, and C. sp. 50) and found reproductive isolation in all cases (Supplementary Table 2). For four species, mating behaviour was observed or inferred from the presence of mating plugs in at least one cross direction. Of these, the cross between C. inopinata males to C. brenneri females yielded embryos but none that hatched into larvae.

Growth on NGM agar plate
To find the optimal laboratory conditions for C. inopinata culture, we tested C. inopinata with the standard C. elegans culture method at various temperatures. C. inopinata nematodes sterilized with a 1% (v/v) chlorine bleach solution as described before 12 , were transferred to nematode growth medium (NGM) plates seeded with Escherichia coli strain OP50-1 to establish a pure culture. At 25°C on the standard media, C. inopinata grew and multiplied slowly compared to C. elegans. C. inopinata can multiply at culture temperatures ranging from 15°C to 29°C ( Supplementary Fig. 3b), which is much higher than C. elegans.
Worms feeding on E. coli strain HT115 showed larger and developed gonads in adults and had larger brood size than those feeding on strain OP50 (Supplementary Table 3), which allowed us to easily perform worm manipulations such as microinjection and microdissection, suggesting strain HT115 might be more suitable for the laboratory culture.

Sex ratio
C. elegans is generally considered a free-living nematode, and mating between males and hermaphrodites produce equal numbers of males and females. In contrast, C. inopinata  Table 6). The numbers of wasp carcasses (i.e. an old generation of wasps) found in a syconium were one or two in most cases. We also detected dauer-like stage C.

Ascarosides
Caenorhabditis elegans produce a set of secondary metabolites several of which have been shown to serve as social cues including components of the dauer-promoting pheromone and soluble mating attractants 27 . Ascarosides are glycosides of the dideoxy sugar ascarylose and can have various modifications that affect their activity. Since ascarosides are made by many nematodes and each species appears to make a distinct spectrum of ascarosides 28 , we tested whether C. inopinata makes some key ascarosides, and then tested their attraction to several that they make.
We tested the response of C. inopinata males to these compounds at a range of concentrations from fM to nM and found that they are attracted to ascr#1, ascr#3, and ascr#10 but not ascr#9 ( Figure 4c). For ascr#10, C. inopinata males exhibit the same concentration preference as C. elegans males. For ascr#3, C. inopinata males are attracted to fM but C. elegans are attracted to pM. C. elegans was not tested at the concentrations of ascr#1 that attract C. inopinata. We further tested the response of C. inopinata males to a mixture of the ascaroside concentrations that produced the three highest CI values. C.
inopinata showed a preference for the mixture solution comparable in magnitude to its preference for one of the mixture's components, ascr#3 (1 fM). These results provide evidence that the signalling molecules (and perhaps, indirectly, downstream pathways) that facilitate mate recognition are conserved between C. elegans and C. inopinata.

Karyotype
Nematodes were fixed with ice-cold methanol and stained with DAPI as previously described 29 . Microscopic observations were carried out using a florescence microscope system (FSX-100, Olympus) and a confocal laser scanning microscope (LSM700, Zeiss).
Twelve chromosomes, all similar length, were observed in prophase cells in the two and fourcell stage embryos ( Supplementary Fig. 4a), suggesting that the karyotype of C. inopinata is 2n = 12. We confirmed in a total of 20 eggs or embryos, all of which had 12 chromosomes, and no specimens showed different compositions of chromosomes.

Collection of nematode materials for DNA and RNA sequencing
The C. inopinata reference genome was assembled from genomic DNA obtained from C. inopinata inbred line NKZ35 (OMT-10) generated from the original strain NK74SC.
Nematodes were cultured on NGM agar plate supplemented with streptomycin (100 µg/ml) and E. coli OP50-1 at 25°C for 10 days and harvested using modified Berman funnel 30 .
Samples were washed three times with M9 buffer and genomic DNA was extracted using Genomic-tip (Qiagen) following the manufacturer's instructions.
For RNA-seq analyses, age-synchronised worms were grown in S-medium 12 with concentrated E. coli OP50-1 at 23°C from eggs collected via bleaching gravid females 12 .
Second or third stage larvae (L2/3) were collected from 48 hr culture, and adult females and males were separately obtained from 96 hr culture by hand-picking using a needle. RNA was extracted from ~500 worms (mixed-stage, L2/3, adult female and adult male) using TRI reagent according to the manufacturer's instructions. Total RNA samples were qualified using Experion (BioRad), and only samples with an RNA integrity quality (RIQ) greater than 8.0 were used for library construction.

Library preparation and sequencing
Paired-end sequencing libraries (Supplementary Table 8

Optical map
High molecular weight genomic DNA was prepared using CHEF Genomic DNA Plug Kits (BioRad) and then mapped using IRYS Mapping System (BioNano). The fragments were fluorescently stained and visualized to determine fragment sizes. Assembling overlapping fragment patterns of single molecule restriction maps produced an optical map of the genome, which was used to improve the genome assembly of PacBio/Illumina data.

Assembly and manual improvement
Illumina reads from multiple paired-end and mate-pair libraries (Supplementary Table 8) were assembled using the Platanus assembler 31 with the default parameter to produce the v1 assembly.
Two assemblies consisting of PacBio reads were produced separately using Falcon The two assemblies from the two different technologies (Illumina and PacBio) were merged using Metassembler (v3 assembly). Then, Haplomerger2 36 was used to remove haplotypic sequences in the assembly and contigs were further scaffolded using Illumina mate-pair reads using SSPACE 37 and manually curated using gap5 (v4 assembly) 38 . Base correction was performed with 8.8 Gb Illumina pair-end reads using ICORN2 39 with 5 iterations to produce the v5 assembly.
The v5 genome assembly was further improved by optical mapping (see above). The optical map consisted of 239 contigs, an assembled size of 141.7 Mb and approximately 78x genome coverage of optical data. The optical map data were used to order and orientate sequence scaffolds, to measure the size of sequence gaps, and independently validate the sequence assembly, producing genome assembly v7.
To assess the completeness of the assemblies we used CEGMA v235 40 and BUSCO 41 .
CEGMA and BUSCO report the percentage of highly conserved eukaryotic gene families that are present as full or partial genes in the assembly. For most eukaryotes, 100% (or nearly 100%) of core gene families represented by a full gene in the genome would be expected.
Thus, these provide a measure of the completeness of the assembly for a species. The C.
inopinata assembly (v7) showed high completeness (  Fig. 4b). The median depth of coverage of male sequence data for this scaffold was approximately 50% compared with the rest of the genome, while the same scaffold had a coverage of 100% using data from immature females ( Supplementary Fig.   4b). We propose that this scaffold represents the X-chromosome. We didn't observe regions that are potentially Y chromosome specific sequences (i.e., highly heterozygous in females).

Repeat analysis
Repeats within the assemblies were identified using the combined outputs of We have identified highly expanded LTR elements in C. inopinata compared to C. respectively. This is predominantly due to an increase in RTE (retrotransposable element) LINES which comprise 1.27 % of the C. inopinata genome compared with only 0.02%, or no, RTEs present in C. elegans and C. briggsae genomes, respectively (Supplementary Table   10).
The distribution of LTR elements and DNA repeat elements has association with genome rearrangements as well as lost genes or unique genes in C. inopinata. By looking at synteny between C. inopinata and C. elegans, we found synteny breaks in the C. inopinata genome were enriched in transposon-related genes (see Synteny analysis). In particular, a region of chromosome V analogous to regions in C. elegans and C. briggsae where the argonaute protein-coding gene ergo-1 is located, gene order has undergone rearrangement and gene loss including loss of orthologues of ergo-1 and k10c9.1/ cbg17878 which codes for a protein of unknown function. In this region two LTR elements and one TcMar transposase are present (Figure 5c), suggesting high LTR retrotransposon and/or DNA transposase activities could be a driving force of the gene loss in this region. Expansion of DNA transposons and LTR elements in the C. inopinata genome could therefore contribute to evolutionary divergence between C. inopinata and C. elegans.

Gene finding
To predict protein-coding genes, Augustus (v. 3.0.1) 48 was trained for C. inopinata based on a training set of 1500 non-overlapping, non-homologous and manually curated genes from the initial gene predictions based on CEGMA core genes 40 and Augustus C.
elegans parameters. A selection of gene models was curated in Artemis 49 using aligned RNA-seq data and BLAST 50 matches against the NCBI database. RNA-seq reads were mapped to the genomes using Hisat2 51 (parameters: --rna-strandness RF --min-intronlen 20 -max-intronlen 10000). Based on the Hisat2 alignments, the bam2hints program (part of the Augustus package) was used to create the intron hints, with minimum length set to 20 bp.
The mapped RNA-seq reads were also assembled into transcripts using Cufflinks 52 with minimum intron length of 20 bp. The predicted exons in the resultant set of transcripts were used as the exonpart hints.
The trained versions of Augustus were run using all the hints for that species as input.
Introns starting with 'AT' and ending with 'AC' were allowed (--allow_hinted_splicesites=atac). A weight of 10 5 was given to intron and exonpart hints from RNA-seq. The minimum intron length was set to 15 bp. If Augustus predicted multiple, alternatively spliced transcripts for a gene, we only kept the transcript corresponding to the longest predicted protein yielding a total of 20,976 gene models (v7.7 gene set).
Gene structures in the Augustus predictions were refined by manual curation in Artemis, using RNA-seq mapping results and BLASTP 50 results against C. elegans proteins. In this manual curation process, we targeted mainly on erroneously fused gene models (v7.9 gene set). We further curated genes coding 7TM-GPCRs intensively using TBLASTN 50 with C.
elegans GPCRs as queries to get the v7.10 gene set. If a gene was manually curated, it replaced the original Augustus gene prediction in the gene set and we finally obtained 21,609 gene models (Table 1).

Functional annotation
Assigning protein names and GO terms to predicted proteins.
Unique names were assigned to each predicted protein, following UniProt's protein naming guidelines (http://www.uniprot.org/docs/nameprot), where possible. One-to-one or many-to-one (e.g. many-C. inopinata to one-C. elegans) orthologues were first identified based on phylogenetic trees from the OrthoFinder analyses (see below), together with 7 other Caenorhabditis species and UniProt protein names and locus names of the C. elegans orthologues were transferred to the C. inopinata genes.
Gene ontology (GO) terms were also assigned to genes by transferring GO terms from We detected CAZymes using the dbCAN database 56 with HMMER3 with e-value thresholds (1e-13 for >80aa, 1e-9 for everything else) from the gene sets of C. inopinata, C.
elegans and C. briggsae. A phylogenetic tree of the genes in each CAZyme family was generated and if necessary further categorisation was performed based on the tree.
Almost identical repertories of CAZyme families were found in C. inopinata compared to those found in C. elegans and C. briggsae (Supplementary Table 13). In C. inopinata, 23 families of glycoside hydrolases (GH) were identified, but no enzymes that were related to Although C. elegans has higher number of glycosyltransferases (285) than C. inopinata, GT family repertories were identical to each other.

Identification of gene families, orthologues and paralogues
To establish orthology relationships among Caenorhabditis species, non-redundant proteomes (protein sets containing only longest isoforms) of eight Caenorhabditis species and an outgroup species were obtained from WormBase (version WS255). These species include C. elegans, C. briggsae, C. angaria, C. brenneri, C. japonica, C. remanei, C. sinica, C. tropicalis and P. pacificus. We used OrthoFinder (v0.2.8) 57 with default options to assign the orthology.
A total of 264,976 genes from 10 species were assigned into 16,861 orthologue groups with a median size of 10. Number of orthologue groups with all species present, single-copy orthologue groups and species-specific orthologue groups were 5300, 1016 and 541, respectively ( Figure 2a).

Species tree reconstruction
From the OrthoFinder results, 1,016 gene families were identified that contain single gene from each species (single-copy orthologues). The proteins in each single-copy family were aligned using MAFFT version v7.221 58 , poorly-aligned regions were trimmed using GBlocks v0.91b 59 , and then the 1000 trimmed alignments (all sequences in 16 alignments were removed in the Gblocks trimming) were concatenated. A maximum likelihood phylogenetic tree was produced based on the concatenated alignment, with each protein alignment an independent partition of these data, applying the best-fitting substitution model identified using the RAxML option (-m PROTGAMMAAUTO). This inference used RAxML v8.2.7 60 with 10 random addition-sequence replicates and 500 bootstrap replicates, and otherwise default heuristic search settings.
The resulting maximum likelihood tree showed an identical topology as trees inferred by

Divergence estimate
We estimated neutral divergence between C. inopinata and C. elegans following Cutter's method of divergence time estimation 61,62 . To begin with the estimation, we computed lineage-specific rate of synonymous (dS) and non-synonymous (dN) substitution for orthologous groups of genes (see above) between each of Caenorhabditis species in the phylogeny using Codeml in PAML (v4.9) 63 with options (runmode = 0, CodonFreq = 2, model = 1, fix_omega = 0, omega = 0.01). Lineage ages (T) were inferred by applying median of synonymous-site divergence values and direct measures of the average per-site mutation rate in C. elegans (μ = 9.0 × 10 -9 mutations per generation) 64 to the equation from the neutral theory of molecular evolution T = dS/μ. Since there would be saturation in dS among species, effective codon usage numbers (Nc) was calculated by DAMBE (v6) 65 Fig. 4c). If following on Cutter's method 61,62 to put a 60-day average generation time (~6 generations per year), it will yield separation times of 23.46 million years ago and 21.00 million years ago for C.
inopinata and C. elegans, respectively, from their most recent common ancestor (TMRCA) ( Supplementary Fig. 4c). Considering that the vector wasp of C. inopinata has a 6-8 weeks generation time 22 and C. inopinata should have at least 2 generations per wasp generation according to our observations of non dauer-like 3rd stage larvae in syconia, the average generation time is likely less than 30 days making the separation time <12 million years.

Gene family analysis
Genes in each Caenorhabditis species were categorised based on their relationships in orthologous groups: i) Conserved one-one; one-many; many-many present in all species, ii) Caenorhabditis specific, iii) Elegans super-group-specific, iv) C. elegans and C. inopinata specific, v) members of species-specific gene family, and v) species-specific singletons.
Genes that did not fit any of the above categories were categorised as "other". To estimate branch or lineage specific gain and loss of orthologous gene families, we used CAFÉ (v3) 66 with gene family results from OrthoFinder and ultrametric phylogenetic tree using divergence values calculated above as inputs under parameters "-p 0.01, -r 1000".
CAFE identified 198 gene families (with a total of 2349 genes) with significantly higher-thanexpected rate of gains/losses in the C. inopinata lineage (P ≤ 0.01, Supplementary Table 14).
These significantly-expanded gene families were enriched in GO terms associated with transposons, whereas significantly contracted gene families were enriched in GO terms related to signalling receptions (Supplementary Table 14). The GO term "G-protein coupled olfactory receptor activity" was detected both in expanded and contracted families. Most of the species-specific lineage gene gains are singleton genes (Figure 2a). Interestingly, there is greater gene loss in C. inopinata (2,993) compared to that in C. elegans (1,403) ( Figure   2a). In conclusion, the number of genes is smaller in the C. inopinata and C. elegans clade, and can be linked to higher levels of gene loss in this clade than the other Caenorhabditis clades. Moreover, partial conserved ('other genes' category) accounts for the major difference in gene number between C. inopinata and C. elegans.

Conservation of key biological pathways
To see conservation/difference of key biological pathways in C. inopinata, C. elegans and C. briggsae, we looked closely at orthologues involved in well-studied biological pathways of interest, including Ins/IGF-1 signalling, dauer formation, sex determination, and small RNA pathways (Supplementary Table 15 -19).
Most of those orthologues were identified in C. inopinata in a one-to-one manner with C.
elegans and C. briggsae, suggesting conservation of key biological genes in the group, and also supporting the high quality of gene predictions in C. inopinata. Some exceptions were, however, found in each pathway. For examples, in the Ins/IGF-1 signalling pathway, we found serine/threonine kinase Akt/PKB genes are duplicated in C. elegans (akt-1 and akt-2) and one of two 14-3-3 proteins in C. elegans (encoded by par-5/ftt-1 and ftt-2) is missing in C. inopinata, whereas two copies of the daf-18 orthologue, which exist as a single copy in C.
elegans, were found in C. inopinata.
Interestingly we found a loss of ergo-1, eri-6/7 and eri-9 orthologues in C. inopinata. The gene loss was confirmed by orthofamiliy analysis, BLASTp/tBLASTn search against the C.
inopinata genome and gene models using the C. elegans sequences as queries. No similar sequences were found in the transcriptome data either. These three genes all code for proteins involved in the ergo-1 siRNA pathway in C. elegans. In comparison, other siRNA pathways, e.g., the ALG3/4 pathway, are highly conserved among the three Caenorhabditis species (Supplementary Table 19). ERGO-1 forms part of the ERGO-1 Argonaute complex that binds to 26G siRNAs, particularly in the female germline where 26G RNAs are enriched in oocytes and embryos through to early larval development 67 Table 11). The ERGO-1 protein is closely related to the piwi-related Argonautes PRG-1 and PRG-2 (piRNA pathway). These Argonaute proteins bind to 21U RNAs which predominantly target transposons and thus have a role in transposon regulation 70 . In common with the piRNA pathway, the ERGO-1 26G-RNA pathway also trigger the production, and subsequent binding of 22G-RNAs, to WAGO-type Argonaute proteins. The ERGO-1 Argonaute is less well characterised compared to piwi-related Argonautes in nematodes and more research is required to establish if ERGO-1 is also specifically involved in transposon regulation, as well as gene family expansion more generally. Furthermore, in C. elegans, mut-1 is an essential component of the ERGO-1, but not ALG3/4, 26G siRNA pathways. In mut-1 mutants the expression of ERGO-1 is suppressed and this is coupled with a suppressed desilencing of transposable elements 71 . It is also of interest to note that mariner-like DNA transposon genes can be found in the same region of the C. inopinata genome that ergo-1 was lost, and where LTR elements are also present (Figure 5c).
Compared with the ergo-1 region of the genome which has undergone rearrangement in C.
inopinata, the region that corresponds to the eri-9 region shows greater gene synteny with C. elegans and C. briggsae. In C. elegans the eri-6 and eri-7 genes are located in the same region of the genome, but in C. inopinata this region is located in two distinct regions of chromosome I, more similar to the gene synteny observed for C. briggsae (Data not shown).

GPCRs
Chemoreception of environmental stimuli is a major sensory system in small soil nematodes like C. elegans 72,73 . Chemoreception is mediated in C. elegans by members of the seven-transmembrane G-protein-coupled receptor class (7TM GPCRs). Those receptors in C. elegans are specifically called "serpentines" and comprise approximately 1300 apparently intact genes with ~400 pseudogenes 73 . In the Pfam domain analyses, we found domains related to 7TM were significantly contracted in C. inopinata compared to C. elegans (see above, Supplementary Table 12). We manually curated 7TM-GPCR genes in the C.
inopinata genome based on TBLATN searches using C. elegans serpentines as queries, and elegans and C. briggsae were constructed using Mafft v.7.221 58 and FastTree 2.1.8 74 . It has been reported that C. elegans has highly expanded serpentine families and the gene number differences between C. elegans and other Caenorhabditis species were the result of gene expansions in C. elegans rather than gene losses in other species 72,73 . Our phylogenetic analyses, however, further revealed that massive gene losses clearly occurred in C. inopinata, as we found many one-to-one or one-to-many clusters of C. elegans and C.
briggsae that do not contain C. inopinata genes in the serpentine family trees (e.g. srd, srh, sre and str) (Figure 4b). We also found local expansions of C. inopinata serpentine genes in those trees, which retain C. inopinata serpentine gene number as high as that of C. briggsae ( Figure 4b, Supplementary Fig. 6). This may reflect the C. inopinata life style in which the nematode doesn't require a wide variety of receptors to detect environments like C. elegans or other soil nematodes because of its limited habitat area (inside fig fruit) and the characteristic lifecycle in which a fine detection for specific types of chemical stimuli may be required, probably due to the close interaction with the fig and the vector insect (Figure 1c).

Synteny analysis
Gene synteny was inferred between C. inopinata, C. elegans and C. briggsae using DAGchainer 75 based on positions of orthologous genes. Synteny-linking plots were generated using in R and Circos (v0.69-4) 76 . C. briggsae scaffolds on Chr.II, Chr.III and Chr.IV were reverse complemented to illustrate the most parsimonious scenario. Gene collinearity is largely conserved and frequently rearranged within chromosomes among the three Caenorhabditis genomes (Supplementary Fig. 7a). The frequencies of rearrangements on autosomes appear to be independent of species relatedness as well as arm/center definition (Figure 2b, Supplementary Fig. 7a). Between C. inopinata and C. elegans, only few inter-chromosomal rearrangements were detected (Figure 2b, Supplementary Fig. 7a). 76% of the C. inopinata genome can be assigned to blocks of collinear genes (synteny), while ~75% for C. elegans genome (Table 1) Table 21).
To investigate why C. inopinata is 20 Mb larger compared to C. elegans despite similar coding content (Table 1, Supplementary Table 9), we looked at the distribution of syntenic block size in both species. We found that longer intergenic spacing is mainly responsible for larger block size on chromosomes of C. inopinata (Supplementary Fig. 7b). Intron sizes are 26 slightly larger in C. inopinata except chromosome IV and X ( Supplementary Fig. 7b).
Chromosome X contains some of the most conserved and largest blocks and fewer rearrangements ( Supplementary Fig. 7a, 7d). Conversely, chromosome V contain speciesspecific gene families which led to synteny breaks ( Supplementary Fig. 7a, 7d).

Arm/centre dichotomy
It has been shown that features such as tandem repeats and conserved genes are distributed unequally across the chromosomes in Caenorhabditis species due to varying recombination rates between arm and centre regions 77,78 . The completeness of the C.
inopinata genome allowed us to revisit the differences of the chromosome arm/centre that appear to be a hallmark of Caenorhabditis genome evolution.
Center and arm regions were defined according to Ross et al 79 in C. elegans and the distribution of tandem repeats in C. inopinata as tandem repeats have been shown correlated with crossover rate in C. elegans genome 78 . We found that these features were condensed in autosomal arm regions, which is consistent with other Caenorhabditis genomes ( Supplementary Fig. 7c) 77,78 . Synonymous substitution rates (dS) between C. elegans and C. inopinata orthologues at autosomal arm regions are significantly higher than those from center regions (p=1.15e-13), while no differences were found between both regions of chromosome X (p=0.7662) and are lower than their autosomal counterpart (p=3.368e-11) ( Supplementary Fig. 7e). Interestingly, conserved single copy genes are predominantly found on chromosome X (Supplementary Fig. 7d). As a result, the arm/center dichotomy is apparent in the genome of C. inopinata, but conserved genes are dispersed across autosomal arms and centers.