Annelid genomes: Enchytraeus crypticus, a soil model for the innate (and primed) immune system

Enchytraeids (Annelida) are soil invertebrates with worldwide distribution that have served as ecotoxicology models for over 20 years. We present the first high-quality reference genome of Enchytraeus crypticus, assembled from a combination of Pacific Bioscience single-molecule real-time and Illumina sequencing platforms as a 525.2 Mbp genome (910 gapless scaffolds and 18,452 genes). We highlight isopenicillin, acquired by horizontal gene transfer and conferring antibiotic function. Significant gene family expansions associated with regeneration (long interspersed nuclear elements), the innate immune system (tripartite motif-containing protein) and response to stress (cytochrome P450) were identified. The ACE (Angiotensin-converting enzyme) — a homolog of ACE2, which is involved in the coronavirus SARS-CoV-2 cell entry — is also present in E. crypticus. There is an obvious potential of using E. crypticus as a model to study interactions between regeneration, the innate immune system and aging-dependent decline. Enchytraeus crypticus is a soil-dwelling annelid worm that has been used over the past two decades as an ecotoxicology model. Here, Mónica Amorim and colleagues present the first genome for E. crypticus. The authors identify a number of expanded gene families, including several involved with innate immunity.

s Charles Darwin pointed out well over 100 years ago, 'The plough is one of the most ancient and valuable of man's inventions; but long before he existed the land was in fact regularly ploughed, and still continues to be thus ploughed by earthworms. It may be doubted whether there are many other animals which have played so important a part in the history of the world, as have these lowly organized creatures. ' 1 . Without worms, it is likely that the earth's soil would not be capable of sustaining the growth of food for humans and other plant-eating species. Annelid worms cover >22,000 species and are found worldwide in all types of habitats. Yet, knowledge of their genome was virtually absent until now. Sequencing big animals (e.g., gorillas 2 ) has a large impact for conservation, but the importance of small species, often invisible at the naked-eye scale, is well known for their role in supporting life itself 3,4 .
Well-known model species with sequenced genomes include Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. However, very few ecotoxicology models have become genome model species, i.e., equipped with genomic-level endpoints in addition to phenotypic endpoints. The genome of Folsomia candida, a standard terrestrial ecotoxicology arthropod collembolan species 5 , was sequenced 6 in 2017. This added on to the first aquatic ecotoxicology Daphnid model 7 . The species Daphnia pulex was sequenced 8 in 2008; however, this is not the commonly tested Daphnia magna species. Recently among soil annelids, the Eisenia andrei genome has been sequenced 9 ; this provides a high-quality assembly for a soil representative model that is also used in ecotoxicology similarly to Eisenia fetida. Other sequenced annelids include Helobdella robusta and Capitella teleta, but these are not ecotoxicology models.
The species sequenced in this study, Enchytraeus crypticus, is a soil invertebrate belonging to the phylum Annelida, class Clitellata, order Oligochaeta and family Enchytraeidae (Fig. 1). Enchytraeids are the most important organisms in many habitats, dominant both in biomass and abundance 10 and ranging between 10 2 and 10 5 individuals/m 2 . They belong to the saprophagous mesofauna and play an important role in the degradation of organic matter. Contrary to many larger earthworms, which live in the humus or soil surface, enchytraeids inhabit the actual soil layer. Through their feeding activity, the soil assumes a fine-grained 'crumb' structure with an often higher stability than that of the bulk soil 11 . Enchytraeids are generally obligatory amphimictic hermaphrodites, but some species are able to reproduce by either parthenogenesis or self-fertilization. Most species reproduce sexually by means of egg and sperm production, cross-fertilization and cocoon deposition. E. crypticus can also reproduce via fragmentation: observations confirmed the regenerative ability in the posterior part (tail segments) after artificial amputation, whereas the anterior part was not able to regenerate 12 . One hypothesis is that autotomy can be used by this species as a self-defense mechanism in response to stress or injuries from physical or chemical stimuli, allowing detoxification and survival. Enchytraeus crypticus are probably diploid, although this has not been confirmed.
Because of their relevance and sensitivity, enchytraeids are standard models when it comes to evaluating the environmental risk of human-made compounds 13 and have been used for >20 years for hazard assessment of chemicals. There are standardized protocols to assess survival and reproduction (ISO (International Standard Organization) and OECD (Organization for Economical Cooperation and Development)) 14 , bioaccumulation [15][16][17] and avoidance 18 in enchytraeids, as well as vast arrays of other endpoints available.
There are few terrestrial environmental species with such a tool suite covering genotypic to phenotypic endpoints that are also ecotoxicological models. There has been impressive development in terms of molecular tools for E. crypticus, with a full transcriptome 19 and suite of omics tools available at present; these include customized microarrays with a wide range of transcriptomics applications [20][21][22][23][24][25][26] , proteomics 27 , metabolomics 28 and epigenetics 29,30 , with considerations of big data analysis and progress 24 . This ecotoxicology model species also includes phenotype-level endpoints for embryotoxicity 31 Table 2). From the 7,540 genes with gene ontology annotation, the distribution showed a majority of genes involved in molecular functions, followed by biological processes and cellular components (Fig. 2a).
Most of the genes are involved in binding and catalytic activity within molecular functions, while for biological processes, metabolic and cellular processes are the most represented, followed by regulation, response to stimulus and signaling. Further detail on each gene ontology (GO) term can be found in Supplementary Fig. 1.
De novo assembly and annotation of the E. crypticus mitochondrial genome. Because the full genome assembly did not contain a scaffold representing an intact mitochondrial genome, a separate assembly was attempted by using the Illumina paired-end reads only and specialized software. The resulting mtDNA of E. crypticus has a length of 15,205 bp. When searching for this sequence in the main genome assembly, two scaffolds containing fragmented copies of the mitochondrial genome were identified and removed from the assembly. Annotation of the mitochondrial genome detected a replication origin, 22 tRNA genes, 2 rRNA genes and 13 protein-coding genes, for a total of 37 genes (see MT (Mitochondrial) scaffold in Supplementary Table 1). The gene order is identical to that reported for Lumbricus terrestris 50 , with the exception of a non-coding segment located between trnH and nad5 instead of separating trnR from trnH. A map of the annotated mitochondrial genome is available in Supplementary Fig. 2.
Gene family analysis and orthogroups. The comparison between E. crypticus and eight other relevant species assigned 218,791 genes to orthogroups (~85% (>80%)) (Supplementary Table 3). A phylogenetic tree based on the orthogroup analysis is shown in Fig. 2b. The number of shared orthogroups between the four annelid species is represented in the Venn diagram (Fig. 2c). One would expect larger overlap between E. fetida and E. andrei, but E. fetida data are derived from a poor-quality assembly, and hence results can change substantially once quality increases. The list of significant expansions of gene families in E. crypticus, based on the z-scores, can be found in Supplementary Table 4  (see Supplementary Table 5 for the E. crypticus orthogroup protein description list). A total of 1,751 gene families were shared between E. crypticus and all the other eight species, with 104 being expanded in the E. crypticus genome (when including at least three more species in the comparison). The top 10 largest expansions (Supplementary Table 6) included long interspersed nuclear elements (LINEs) (129 genes), cytochrome P450 (44 genes), tripartite motif (TRIM) (26 genes), ankyrin (ANK; 19 genes), heat shock protein (16 genes), purple acid phosphatase (15 genes), paramyosin (11 genes), vitellogenin (VTG; 10 genes), caspase (10 genes) and hydroxy acid oxidase (HAO) (9 genes), besides several groups with unknown function (e.g., 54 genes in OG0000259). Other gene groups, such as inositol phosphate synthase (21 genes), potassium voltage-gated channel protein Shaw (11 genes) or nAChRbeta1:acetylcholine receptor subunit beta-like (10 genes) also showed high representation and are briefly integrated in the discussion.
With respect to lineage-specific gene families, we counted 307 orthogroups containing 1,370 genes unique for E. crypticus when compared to the eight selected species. An overview of the E. crypticus-specific orthogroups and their gene content can be found in Supplementary Table 7. Zinc fingers, one of the most abundant groups of proteins known for their wide range of molecular functions (transcriptional regulation, ubiquitin-mediated protein degradation, signal transduction, actin targeting, DNA repair, cell migration, etc.) 51 , were among the most represented. Another example included the sarcoplasmic calcium-binding protein, an invertebrate EF-hand calcium-buffering protein, suggested to have a similar function in muscle relaxation as vertebrate parvalbumin 52 .
Collinearity analysis. Intragenomic collinearity analysis detected 313 collinear genes in 25 syntenic blocks (see Supplementary Table 8 for a detailed list). Of those, one appears as an intra-scaffold palindrome on scaffold 15 (ANK) and another one as a tandem repeat in scaffold 129 (zinc finger) (Fig. 3).
Biosynthetic gene clusters. We used antiSMASH (v5.1.2) to identify biosynthetic gene clusters in the E. crypticus genome. The tool reports only one multi-gene cluster as a chemical hybrid of type I polyketide synthase and non-ribosomal peptide synthetase. The two genes in this cluster were also identified as horizontal gene transfer (HGT) candidates (ECRY_011785-RA, ECRY_011786-RA and malonyl CoA-acyl carrier protein transacylase, from the fatty acid biosynthesis) although not confirmed as HGT.
Hox genes. Based on similarity with Uniprot and HomeoDB, we identified a total 160 homeobox genes in the E. crypticus genome. Of these, 38 are members of the ANTP/HOXL class, which is involved in embryonic development. This number is comparable to that found in the recently assembled high-quality genome of Metaphire vulgaris 53 , another annelid. Supplementary Fig. 3 shows the distribution of the homeobox genes over the known classes. A complete list of identified hox genes is presented in Supplementary Table 9. Manual assessment of synteny reveals that genes of the ANTP/ HOXL class exist as multiple homologs located on several scaffolds. We do, however, notice a cluster of Hox1, Hox3, two Hox5 variants and a Hox7 gene on scaffold scf7180000023640.912933. A smaller cluster consisting of Hox1, Hox5 and Hox7 is present on another scaffold, scf7180000023512.337295. In both cases, the orientation is the same for all genes in the cluster.
HGT. By calculation of h-scores, 105 HGT candidates were initially identified; 33 of them were rejected because of the absence of native neighbor genes and long read linkage. Based on their low metazoan bitscore, five genes were confirmed to have been the result of HGT. The remaining 67 HGT candidates were subjected to a phylogenetic test and resulted in an additional 27 confirmed HGT genes, for a total of 32 genes (Supplementary Table 10). The origin of the confirmed HGT genes is represented in Fig. 4. Bacterial origin is detected for 59.4% of the HGT genes, followed by plants and fungi for 25.0% and 12.5%, respectively, and finally Archaea for 3.1%. A Gene Ontology (GO) term enrichment analysis on the set of horizontally transferred genes yields 14 Biological Process (BP) terms and a single Molecular Function term (Supplementary  Table 11).

Discussion
Genome assembly. In this study, we produced the first high-quality genome for the oligochaete enchytraeid E. crypticus. The presented genome has good contiguity and completeness, as revealed by an N50 of 1.2 Mbp and an overall BUSCO score of 94.0%. The genome sequence, together with the currently over 18,000 identified genes, will allow exploration of the mechanisms underlying interactions with the worms' environment and its potential toxicants, organ development/regeneration, adaptation and evolutionary aspects.  54 . In invertebrates, genome size differences have been found to correlate with, for example, life cycle duration 55 and negatively with developmental rate; that is, species with multiple generations per year have smaller genomes (C-values) compared to species with one generation per year. When comparing the two enchytraeid species, the ice worm M. solifugus and E. crypticus (1,250 versus 525 Mbp), the former's twofold larger genome size can be due to fast mutational mechanisms or to natural selection. M. solifugus, a small and heavily pigmented enchytraeid, inhabits glacier areas in some of the coldest and highest UV radiation habitats on earth; it also has a much longer life span, living ~10 years, compared to ~1 year for E. crypticus 38 .
The enchytraeid family has an interesting trait regarding freeze tolerance: an RNAseq study in Enchytraeus albidus showed how the population from Greenland has specific transcriptional differences compared to the German population; both of these strains are freeze tolerant, but the Greenland population is extremely freeze tolerant 56 . The involved key processes are anion transport in the hemolymph, fatty acid metabolism, metabolism and transport of cryoprotective sugars as well as protection against oxidative stress, with peroxisome and toll-like receptor (TLR) signaling pathways being differentially expressed 56 . E. crypticus may be a well-adapted species for its life in the deeper soil layer, more buffered from variations compared to the upper layer, where other annelids, such as E. fetida and E. andrei (compost worms) or M. solifugus (ice worm), live. The fact that E. crypticus inhabits a less-variable environment than other worms may have reduced its gene bank source for adaptation (e.g., gene duplication) to cope with a changing environment.
Species-specific evolution and environment contribute to the end result of genome size and gene diversity. For instance, the E. crypticus genome (life span: 12 months; size: 6-9 mm; and genome size: 525 Mb) is twice as large as that for F. candida (life span: 5-8 months; size: 1-5 mm; and genome size: 220 Mb), a terrestrial arthropod, but the latter has ~10,000 more genes. Among other b No  LAB ANIMAL | VOL 50 | OCtOBEr 2021 | 285-294 | www.nature.com/laban main differences, the arthropod F. candida is a parthenogenetic species, whereas the oligochaete E. crypticus mostly reproduces sexually, besides alternatives like regeneration. For the small crustacean Daphnia pulex (life span: 4-6 months; length: 1-5 mm; and genome size: ~200 Mb), gene duplication seems to be at the core of their evolutionary strategy 8 . Although there seems to be a trend, a larger number of genomes would be needed to allow such an analysis.
Collinearity. As mentioned, the arthropod F. candida is a parthenogenetic species, whereas the oligochaete E. crypticus mostly reproduces sexually. E. crypticus showed 313 collinear genes in 25 syntenic blocks, much less than the collembolan F. candida with 883 genes in 55 syntenic blocks 6 . Gene collinearity has been associated with parthenogenic reproduction types; for example, the sexually reproducing collembolan Orchesella cincta does not show this pattern, and the parthenogenetic nematode Meloidogyne incognita has a mitotic cell division reproduction system 57 . Of the few intra-collinear genes in E. crypticus, zinc finger appears as a tandem repeat in scaffold 129 and is also found among the lineage-specific gene families, adding to its relevance. Zinc fingers, which can have many functions (e.g., binding DNA and RNA and being involved in transcriptional regulation), have also been found in F. candida in a palindrome 6 . In E. crypticus, scaffold 129 has 60 zinc finger genes in inter-collinearity with scaffold 52, besides the 7 intra-collinear tandem repeats.
Furthermore, ANK genes are present in an intra-scaffold palindrome, together with the protocadherin FAT4 and serine/threonineprotein kinase pak-1 (see the discussion section on ANK).  For E. andrei, the LINE2 transposable elements and gene families were functionally related to regeneration (e.g., epidermal growth factor receptor); thus, LINE2 is potentially involved in regulating genes involved in regeneration. E. crypticus is known to regenerate 12 , although only its posterior end, whereas E. andrei regenerates both. Regeneration has not yet been studied at the genomic level for E. crypticus.
Like regeneration, embryogenesis is a stage of high cell proliferation; this has been studied at the transcriptomic level in E. crypticus embryos when exposed to cadmium (Cd) 22 . The down-regulation of pms1, a gene coding for a protein involved in DNA mismatch repair, was observed, suggesting that Cd affects DNA synthesis and repair in E. crypticus embryos. Cd also induced the down-regulation of several genes involved in cell cycle/cell division, including cell division cycle proteins and cell division protein kinases. Injured E. fetida showed wound-induced transcriptional activation of early growth response protein 1 gene (EGR1) 9 ; this could also be the case with E. crypticus. The epidermal growth factor receptor is a transmembrane receptor with tyrosine kinase activity that can regulate cell proliferation and differentiation. Hence, some of the mechanisms involved in regeneration also occur during embryogenesis, which is not surprising given the need for cell proliferation and differentiation in both events. Because earthworms are considered of great interest from the perspective of regenerative biology 9 , this can now be complemented by studies in E. crypticus; that is, the underlying mechanisms for regeneration 58 can now also be studied in enchytraeids (E. crypticus), which have a shorter life cycle than E. andrei.
TRIM. Recent studies have revealed that TRIM proteins play key roles in innate antiviral immunity. TRIM, expanded in E. crypticus, is a protein super-family conserved in metazoans that expanded rapidly during vertebrate evolution. There are more members in humans (65) and mice (64) than in worms (~20) and flies (<10). Many TRIM proteins are induced by type I and II interferons, which are crucial for resistance to pathogens, and several are known to be required for the restriction of infection by lentiviruses 59 .
Type I interferon induction is a central event of the immune response against viral infection, relying on the recognition of pathogens by cellular pattern recognition receptors (PRRs), which then trigger several signaling cascades resulting in pro-inflammatory cytokines and interferon production 60 . TRIM proteins are essential and act as restriction factors or by modulating PRR signaling. TLRs and other PRRs are engaged by bacterial, viral or fungal components, which triggers the innate immune responses. Although TRIM genes clearly arise from a common ancestral gene, they evolved independently, having acquired species-specific functions 59 . Invertebrates are exposed to a wide array of natural and anthropogenic threats with which the immune system has to deal. For instance, M. solifugus tolerate huge amounts of UV radiation compared to many other organisms to endure in the arctic ice and snow. Melanin synthesis, which gives M. solifugus its dark brown color, is known to be a central mechanism of innate immunity and a major response to various immune challenges, including UV. Part of the melanin synthesis pathway is catalyzed by the enzyme phenoloxidase; the phenoloxidase cascade produces melanin and induces multiple potent bioactive agents, such as peroxinectin and Reactive Oxigen Species (ROS), that aid in phagocytosis and cell adhesion. E. crypticus, which has a milky transparent dermis, would have to cope with UV in a different manner.
Other examples include exposure to nanomaterial (NM) contamination, which also activates the innate immune system via different mechanisms 20,25,26 . NM recognition can first occur upon interaction with surface receptors-typically innate immune PRRs 61 . As NMs enter a biological environment, they become covered with a corona of proteins, sugars or other compounds. The coronas can mask the NM surface and prevent immune recognition. The importance of protein corona composition for NM recognition was studied in coelomocytes by using coelomic proteins (native repertoire) of the earthworm E. fetida compared to FBS (non-native reference) 62 . Over time, silver (Ag) NMs can competitively acquire a biological identity native to the cells in situ, although significantly greater cellular accumulation is observed with coelomic protein corona complexes, with lysenin having a key role. On the basis of the genome sequence, we can now look for similarities between E. crypticus and E. fetida, and we find that lysenin is present only in E. fetida. This is a case of species-specific formation of biomolecular coronas and suggests that the use of representative species may need careful consideration in assessing the risks associated with NMs. With our knowledge of the genome, it also means that a protein with similar function in E. crypticus can potentially be identified and possibly linked to the same initiating event.
As mentioned, various organisms may activate similar but not identical mechanisms for the recognition of and response to NMs. For instance, plant PRRs, similar to animal TLR, recognize microbe-or pathogen-associated molecular patterns and trigger defense responses (e.g., ROS production, Mitogen Activated Protein (MAP) kinase activation and induction of defense genes). Worms (e.g., both E. fetida and E. crypticus) have a wide range of genes coding for extracellular recognition proteins (e.g., lectins, peptidoglycan-recognition proteins, lipopolysaccharide-and β1,3glucan-binding proteins and fibrinogen-related proteins), and any of these are good candidates for similar function identified for lysenin, for example, in enchytraeids.
Research in innate and primed immunity in E. crypticus may open new horizons for developing strategies to prevent or combat infectious diseases, inflammatory conditions and auto immune disorders. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a compelling case of innate immune hyperactivity 63 , causing acute respiratory distress syndrome. SARS-CoV-2 enters cells through the angiotensin-converting receptor (ACE) 2 (ACE-2) receptor, which is expressed in a small set of alveolar type 2 epithelial cells. The gene coding for ACE is present in the E. crypticus genome, further confirming the potential of this species for immunology studies. Transcriptomic studies showed the activation of the ACE gene in E. crypticus as a response to stress when exposed to TiO 2 NMs 21 and Ag NMs 25 . This activation was material specific; for example, the Joint Research Centre (JRC) reference TiO 2 NMs NM105, NM104 and NM103 caused an ACE upregulation when exposed under UVB light, whereas exposure to a TiO 2 -Fe-doped library without UV caused its downregulation. Ag materials caused upregulation of ACE for exposure to AgNM and Ag-PVP coated 25 . As mentioned, it has been shown that NMs are handled as invaders by cells, like viruses, and can activate similar mechanisms 64 . A related observation is that SARS-CoV-2 disables interferons-'strikingly depressed interferon activity and elevated chemokines in individuals whose disease became severe and critical'-hence, a dosage of synthetic interferons to both healthy and infected people might help tame the disease 65 . Although there is more to coronavirus disease than innate response, E. crypticus may be useful for LAB ANIMAL | VOL 50 | OCtOBEr 2021 | 285-294 | www.nature.com/laban studying fundamental mechanisms (see the earlier discussion on interferons associated with the TRIM).
Caspases. Another expanded family in E. crypticus includes caspases, a group of cysteine-based proteases that are essential not only during apoptosis but also for the immune system. The role of caspases in cell death has further revealed a caspase-driven compensatory proliferation, apoptosis-induced proliferation 66 , known to be involved for some forms of regeneration (as discussed above). Several NMs activate the NLRP3 inflammasome, inducing caspase-1 activation and the production of inflammatory IL-1β. Silica particles have been shown to induce caspase-1 activation and pulmonary inflammation.
In a parallel manner, a major known application of NMs is as antimicrobial agents (e.g., Ag and Cu NMs). The microbiota and immune functions are integrally linked; hence, studies should cover the impact on the interaction between bacteria and host immunity. The gut microbiome of E. crypticus has been shown to be altered when exposed to Cu materials 67 , shifting the communities to a decline in the relative abundance of Planctomycetes and an increase in Bacteroidetes, Firmicutes and Acidobacteria; antibiotic resistance genes in E. crypticus decreased significantly. The fungicide azoxystrobin also altered the abundance of core potential beneficial bacteria and increased the number and abundance of antibiotic resistance genes in the E. crypticus gut 68 , besides having a severe impact on survival and reproduction 69 .
Cytochromes P450 (CYPs). When it comes to stress response, cytochromes P450 (CYPs), a superfamily of enzymes found across most species, are important for hormone biosynthesis and the clearance of various compounds, oxidizing steroids, fatty acids, and xenobiotics. CYPs are expanded in E. crypticus; their expansion in other species such as Mesobuthus martensii (scorpion) 70 and F. candida 6 has been associated with their survival in hazardous environments and linked to feeding on phytotoxins from herbivorous insects or larva. Hence, they could be an important mechanism of adaptation to an environment (here, soil) where toxic compounds persist and accumulate in decaying soil organic matter; these toxic compounds can include plant anti-herbivory toxins, lignocellulose breakdown products and feeding deterrents.
Heat shock proteins (HSPs). The expanded HSP family are highly conserved proteins produced by cells in response to stress. For example, the HSP70 group consists of both constitutive and stress-induced HSPs as studied in E. crypticus 29 . One of the essential roles of HSPs under 'normal' conditions is to promote proper embryonic and postnatal development of multiple organ systems, particularly the nervous system 71 . A study covering embryo development and transcriptomics in response to Cd exposure showed no activation of HSPs in E. crypticus 22 , except for SSB1; this is in line with the findings 72 that loss of SSB1 (combined with SSB2) impairs embryogenesis 72 . HSP70 induction as a response to stress has been shown in E. crypticus, for example, an increase after multigenerational exposure to copper and a turn-off when transferred to clean media 30 .
Purple acid phosphatases (PAPs). PAPs are metalloenzymes that catalyze the hydrolysis of phosphomonoesters and amide substrates. PAPs are highly conserved within eukaryotic species, although varying substantially between plants and animals 73 . Functional studies indicate that PAPs have flexible mechanisms. They are well known in mammals for their involvement in bone metabolism, and functions include iron transport and generation of ROS as an immune response. For plants, a speculated function is the mobilization/scavenging of inorganic phosphate from organophosphates in the soil. The fact that the PAP family is expanded in E. crypticus could be related to its soil habitat and the need to extract and manage its broad range of metallic elements, as well as the response to stress. PAP seemed to be a good candidate for HGT from plants, although we did not find evidence supporting this in our analysis. For instance, in F. candida there was HGT from the arbuscular mycorrhizal fungus (AMF) Rhizophagus irregularis, which facilitates its grazing by F. candida 6 . In return, these AMFs benefit from spreading and inoculation to other plants, and plants benefit from phosphorus uptake from AMFs: it is a tritrophic mutualism, contributing to soil health. F. candida was also the first animal discovered with penicillin biosynthesis genes in its genome 6,74 ; the isopenicillin N synthase gene is now also found in the E. crypticus HGT gene list. This suggests that these organisms have evolved to be well adapted in their soil habitat and have been able to develop antibiotic capacity in their microbe-dominated environment.
HAO. HAO (glycolate oxidase) 1 (HAO1) is a protein in the peroxisome encoded by the HAO1 gene in humans. HAO1 belongs to the superfamily of the alpha HAO enzymes. HAO1 catalyzes the flavin mononucleotide-mediated oxidation of glycolate to glyoxylate and glyoxylate to oxalate with reduction of oxygen to hydrogen peroxide; hence, it is central in the toxicity of ethylene glycol poisoning. The gene is primarily expressed in the liver and pancreas. Why HAO is expanded in E. crypticus is not obvious, but it could be for detoxifying functions, because response to stress seems to be prioritized in these organisms.
VTG. This is the major egg yolk precursor protein, which provides protein-and lipid-rich nutrients for developing embryos. The response of VTG to endocrine disruptive chemicals has been well studied in fish, where males can express the VTG gene in a dose-dependent manner. Invertebrates also possess an endocrine system 75 and VTG-like proteins, although this is poorly understood. The roles of VTG and its derived yolk proteins lipovitellin and phosvitin include host innate immune defense with various functions 76 . VTG could play a role in response to stress and innate immunity in E. crypticus, although further studies are needed to clarify this.
Paramyosin. Paramyosin has been found in invertebrate muscles, and it would make sense that their expansion in E. crypticus, a soil worm, relates to the movement and burrowing function requirement for strong muscle contraction. Paramyosin is also a prominent antigen in human cysticercosis and may have a role as a modulator of the host immune response.
Potassium voltage-gated channel protein Shaw. Potassium voltagegated channel protein Shaw, the Kv3 family, is highly represented in E. crypticus, and these proteins are important in shaping action potentials and in neuronal excitability and plasticity. In animal cells, the K + channels are involved in neural signaling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors and may have a role in targeted cell lysis. Some K + channels open in response to depolarization of the plasma membrane, others to hyperpolarization or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter with intracellular kinases or regulated by GTP-binding proteins or other second messengers.
Acetylcholine receptor. The acetylcholine (ACh) receptor (subunit beta-like 1) was well represented in E. crypticus. It binds ACh and responds by a change in conformation, which leads to opening of an ion-conducting channel across the plasma membrane. ACh and γ-aminobutyric acid (also present in E. crypticus) are among the group of neurotransmitters described in vertebrate and invertebrate nervous systems: ACh is a major excitatory transmitter, and GABA is a major inhibitory transmitter, both at the neuromuscular LAB ANIMAL | VOL 50 | OCtOBEr 2021 | 285-294 | www.nature.com/laban synapses and in the central nervous system. Several pesticides/ insecticides (e.g., dimethoate) are designed to target the ACh pathway, and the impacts have been studied in regard to avoidance behavior, a relevant ecological trait for organisms to escape contaminated environments. Studies with E. crypticus showed an association between lack of avoidance behavior because of boric acid and an increase in the γ-aminobutyric acid receptor-associated protein, whereas acetylcholinesterase did not seem to be affected 77 . Non-avoidance to phenmedipham, however, seems to be associated with acetylcholinesterase inhibition in E. albidus 45,78 .
Inositol phosphate synthase. The expanded enzyme inositol phosphate synthase, which catalyzes the conversion of D-glucose 6-phosphate into 1D-myo-inositol 3-phosphate, is important for the production of inositol-containing compounds, including phospholipids (important for cell membrane formation and integrity), cell signaling and membrane trafficking. Mechanisms of cold adaptation or acclimation have been associated with changes in the membrane phospholipid composition, gradually undergoing a transition from liquid-crystalline to gel phase. The properties of membranes of E. albidus from seven populations (polar to temperate) have been studied, showing that the composition of phospholipid fatty acids varied significantly but that the 'optimal' fluidity of the membrane was apparently kept 79 . The accumulation of glucose, a cryoprotectant, has been observed, and glucose could possibly have a putative role in the fluidity of membranes.
ANK. ANK, also expanded, are a family of proteins that serve as adaptor proteins linking membrane proteins to the underlying cytoskeleton 80 . This is required to maintain the integrity of plasma membranes and to anchor specific ion channels, ion exchangers and ion transporters in the plasma membrane. Hence, both inositol phosphate synthase and ANK play a role in membrane integrity; this seems to be an important feature for E. crypticus, given the gene families' expansion. In addition, ANK is required for the polarized distribution of many membrane proteins, including the Na + /K + ATPase, the voltage-gated Na + channel and the Na + /Ca 2+ exchanger; hence, this must be an important regulation, because the K + voltage-gated channel protein Shaw was also observed expanded. The ANK genes may have been transferred from plants and fungi to E. crypticus via HGT (see Supplementary Table 10, from Phytophthora megakarya and Planctomycetes bacterium). As mentioned above, ANK genes are present in an intra-scaffold palindrome, together with the protocadherin FAT4, a calcium-dependent cell adhesion protein playing a role in maintaining cell polarity. Other genes in the palindrome include serine/threonine-protein kinase pak-1 (which has important roles in cytoskeleton dynamics and cell adhesion), Ras-related protein Rab-5C (cell transporter; e.g., vesicular traffic). Hence, this hairpin gene structure with proximity between ANK, FAT4, protein kinases, etc., is not random and must aid in repairing and keeping a key function. One could argue the importance of these ionic stabilizers for their role in the observed plasticity of enchytraeids to survive in aquatic biotopes-many often living in marine interstitial environments, where the level of salts is much higher than terrestrial soil. Studies have shown that the presence of low levels of salinity (15-20‰ NaCl) clearly improves the reproduction of E. albidus 81 , a species often found at coastal shores among algae and in large abundances. Taking advantage of this asset from a toxicological perspective, an aquatic test has even been developed for enchytraeids 82 , and exposure to stressors in water is possible during a short period of time [83][84][85] , allowing researchers to screen effects via an aquatic exposure route.
Challenges and future research applications. From the genome, a potential advance involves the possibility to confirm hypotheses and underlying mechanisms of response to stressors, often a missing link in ecotoxicology. This is feasible by using gene knock-out or gene knock-down (silencing) techniques that have been successfully demonstrated in other invertebrates, including CRISPR-Cas9 86 , transcriptional activator-like effector nucleases 87 and RNA interference 88 . Such development and creation of proof-of-concepts will have direct impact for regulation, for example, in Registration, Evaluation, Authorisation and Restriction of Chemicals for chemicals and developing adverse outcome pathways 89 , where the causality between transcriptomics and impacts on the phenotype remains one of the main sources of uncertainty for their wider usage and implementation.
Another important future direction from the genome will be the study of the epigenome, representing one of the major regulators of observed effects and its environmental linkage. Although epigenetics has received vast attention for some species, this is not the case for invertebrates, and even less for environmentally relevant species. With the availability of the complete genome of E. crypticus comes the possibility of applying cheaper, more-feasible and/or more-targeted epigenetic genotyping tools, such as Methylated DNA Immunoprecipitation Sequencing (MeDIP-seq), Methyl-CpGbinding domain sequencing (MBD-seq) and epigenetic microarrays. These tools can be used to study how the genome is accessed in different cell types and during development and differentiation. These tools can also provide valuable information on how the organism is reacting on a molecular level to environmental changes. For example, we will be able to learn much more about innate immune memory and priming, information also relevant for humans.

Conclusions
The first high-quality draft genome for E. crypticus was sequenced and assembled, resulting in a 525-Mbp genome, with currently >18,000 identified genes, good contiguity and completeness. Evidence suggests that E. crypticus may be a well-adapted species in its environment, but its genome adaptation and evolution can now be explored. Expanded gene families showed that the genome evolved to respond to stress (CYP) and to develop the innate immune system (TRIM), which are often activated via connected mechanisms. Its capacity to regenerate is a very interesting asset (LINEs), and although it has been found to be inversely related with the evolution of the innate immune system, successful regeneration requires adequate immune response. There is an obvious potential for using E. crypticus as a model to study interactions between regeneration, innate immune system and its aging-dependent decline. Last, the possibility of studying embryo development, a stage of high cell proliferation like regeneration, and the ability to link genes to phenotypic effects represent major advantages of working with E. crypticus. Available transcriptomics studies have linked response to stress to genome features. The potential for future research now includes hypothesis confirmation via gene knock-out and epigenetics.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41684-021-00831-x.