Transcriptome profiling and in silico detection of the antimicrobial peptides of red king crab Paralithodes camtschaticus

Endogenous antimicrobial peptides (AMPs) are evolutionarily ancient factors of innate immunity, which are produced by all multicellular organisms and play a key role in their protection against infection. Red king crab (Paralithodes camtschaticus), also called Kamchatka crab, is widely distributed and the best known species of all king crabs belonging to the family Lithodidae. Despite their economic importance, the genetic resources of king crabs are scarcely known and no full-genome sequences are available to date. Therefore, analysis of the red king crab transcriptome and identification and characterization of its AMPs could potentially contribute to the development of novel antimicrobial drug candidates when antibiotic resistance has become a global health threat. In this study, we sequenced the P. camtschaticus transcriptomes from carapace, tail flap and leg tissues using an Illumina NGS platform. Libraries were systematically analyzed for gene expression profiles along with AMP prediction. By an in silico approach using public databases we defined 49 cDNAs encoding for AMP candidates belonging to diverse families and functional classes, including buforins, crustins, paralithocins, and ALFs (anti-lipopolysaccharide factors). We analyzed expression patterns of 27 AMP genes. The highest expression was found for Paralithocin 1 and Crustin 3, with more than 8,000 reads. Other paralithocins, ALFs, crustins and ubiquicidins were among medium expressed genes. This transcriptome data set and AMPs provide a solid baseline for further functional analysis in P. camtschaticus. Results from the current study contribute also to the future application of red king crab as a bio-resource in addition to its being a known seafood delicacy.

www.nature.com/scientificreports/ Natural peptides can become prototypes of new broad-spectrum antibiotic candidates which are urgently needed to tackle the challenge of antimicrobial resistance, a serious threat globally. The development of resistance to AMPs is rather difficult, since it requires major changes in the structure and electrophysiological properties of the cell membrane 12 . AMPs increase the permeability of the cell membrane, thereby enhancing the effect of traditionally used antibiotics and can be used in combination with them 11,13 . According to the APD database (https ://aps.unmc.edu/AP), about 3,073 natural AMPs have been discovered to date, and at least a third of them are from invertebrates 14 . Especially marine invertebrates could be a valuable source of new AMPs.
The red king crab (Paralithodes camtschaticus), also called Kamchatka crab or Alaskan king crab, is the best known species of all king crabs. It is the largest arthropod and has a crab-like morphology and a strong calcified exoskeleton with spines 15 , and belongs to the family Lithodidae (Anomura). The red king crab is widely distributed and native to the Bering Sea, around the Kamchatka Peninsula and neighboring Alaskan waters, and is also present in the Barents Sea (www.wikip edia.org). It became established in the latter in the mid-1990s after introduction to the area in the 1960s 16 . The red king crab is a highly valued delicacy on the international market and currently contributes significantly to the income from fisheries in the regions where it is harvested 17,18 .
In addition to its being a seafood delicacy with high demand and consumed globally, red king crab has other unexploited potential value as a source of new AMPs from a cold-water marine organism. To date only limited studies have been reported 19,20 . Moreover, the genetic resources of red king crabs are scarcely known 21 and fullgenome sequences are unknown, strongly suggesting that efforts should be given to the investigation of red king crab to reveal its potentially valuable genes, for both fundamental understanding and developing the bio-based economy. In this study, we sequenced the transcriptomes of various parts of P. camtschaticus (top shell, soft belly tissue, leg meat and leg shells), using Illumina high throughput sequencing technology. Sequences were submitted to the NCBI GeneBank (GHGY00000000.1). Bioinformatic analysis focused on putative AMPs with complete coding sequences (cds), and among these were three paralithocin genes with antimicrobial properties, defined and described recently 20 . Further identification of novel AMPs from red king crab could be of potential importance for extending the list of AMPs and contributing to future solutions for combating antimicrobial resistance. This transcriptome data set and AMPs provide a solid baseline for further functional analysis of P. camtschaticus.

Results and discussion
Transcriptome sequencing and assembly. The de novo transcriptomes of four types of tissues of two individuals of the red king crab P. camtschaticus (top shell, soft belly tissue, leg meat and leg shells) were sequenced on Illumina HiSeq 3/4,000 using two lanes and 2 × 150 bp (paired-end). This resulted in a draft gene set of 384,854 transcripts and 274,917 uni-genes of red king crab. Additionally, Augustus v3.3.2 was used to find full length transcripts. The uni-genes were annotated using a combination of blastx to several databases, such as nr, TrEMBL and Swiss-prot, and id mapped to eggnog, Kegg and GO. Distribution of GO terms of the red king crab biological processes, molecular functions and cellular compartments is presented in supplementary figure S1.
In silico identification of AMPs. We used full length transcripts from red king crab and searched for similarity to putative AMPs, and found 30 AMP candidates belonging to diverse families and functional classes. For 27 of the predicted proteins we found full length gene models and for 7 just fragments, without 3′ or 5′ ends. Additionally, we found 19 considerably longer gene models (up to 2 kB), containing sequence motifs similar to known AMPs, often in several copies. Predicted in silico antimicrobial peptides of red king crab are presented in Table 1 and a full description is presented in Table S1.
Below we will characterize the most interesting groups of AMPs in red king crab. Buforins are a large group of AMPs derived from the N-terminal region of histone H2A that interacts directly with nucleic acids. Buforin I is generated from histone H2A by pepsin-directed proteolysis in the cytoplasm of gastric gland cells and provides an antimicrobial protection in the stomach of the Asian toad Bufo gargarizans 22 . Natural buforin could be further treated by endoproteinase Lys-C to produce truncated versions of AMPs (21 aa) with higher antimicrobial activity. Buforins, which house a helix-hinge-helix domain, kill a microorganism by entering the cell without membrane permeabilization and thus binding to nucleic acids. The proline hinge is crucial for the cell penetrating activity of buforins 22,23 . We found three gene models encoding for buforin-like AMPs with high similarity to the toad's Buforins I and II (Fig. 1).
There are two other groups of AMPs derived from histone H2A with sequence similarities to buforin-acipensins, isolated from leukocytes of the Russian sturgeon Acipenser gueldenstaedtii 24 , and hipposins, isolated from Atlantic halibut 25,26 . We found ten contigs encoding proteins similar to histone H2A and to buforin or acipensin AMPs. Two of them were more similar to Buforin 1 (AP00308) and 8 to acipensins (AP02813). Most acipensins were not full-length proteins but fragments (Table S1).
Another group of AMPs is ubiquicidins, a cytosolic antimicrobial protein that is identical or highly homologous to the ribosomal protein S30, where the precursor element shows homology to ubiquitin 27 . It preferentially binds to bacterial cell membrane at the site of infection 28 . There were two ubiquicidins with ORF length around 130 aa found in red king crab transcriptome.
Lysozyme is a naturally occurring enzyme found in bodily secretions such as tears, saliva, and milk. It functions as an antimicrobial agent by cleaving the peptidoglycan component of bacterial cell walls, which leads to cell death 29 . Due to the distinct antimicrobial properties, lysozyme has been used effectively in the food industry 30 . We found two contigs, one short and one long, containing a lysozyme gene with full length ORF and an active region of 131 aa. Similarly, lysozyme encoding genes named lys1 and lys2 are also characterized in nematodes (Clarke et al. ongoing work/unpublished data). www.nature.com/scientificreports/ Crab Paralithocins 1-3 were assigned to a previously unknown family of Cys-rich antimicrobial peptides with limited antimicrobial functions against marine bacteria 20 . The full length sequence of Paralithocin 1 was obtained by joining TRINITY_DN18_c0_g2_i4 and TRINITY_DN349_c20_g1_i2, with ORF found at positions 85 -264. The full length protein has a size of 62 aa and contains a signal peptide (MGPMKVLLVLLVVMVAAPHIADA) with cleavage position at 23-24 aa (ADA-WQ). There is one aa substitution in comparison with published data (Fig. 1). Full length sequence of Paralithocin 2 was obtained by joining TRINITY_DN26637_c1_g2_i2 and TRIN-ITY_DN1423_c0_g1_i2, with ORF found at positions 148 -363. The full length protein has a size of 72 aa and contains a signal peptide (MGAAKVLLVVLAVMVAVPNLAEG) with cleavage position at 23-24 aa (AEG-RS). There is one aa substitution at position 5 in comparison with published data (Fig. 2). We detected another AMP very similar to Paralithocin 2, which we called Par2.2. The full length sequence of Paralithocin 3 was obtained from TRINITY_DN5503_c0_g1_i1, with ORF found at positions 59-289. The full length protein has a size of   (Fig. 2B). No differences between predicted and published data were found for Paralithocin 3 (Fig. 2C). Paralithocins 1 and 3 showed blast similarity (e − 4) to a protein with antimicrobial functions Mytilin (ADC29474, ADC29471) from sea mussel (Mytilus coruscus). Paralithocin 2 showed weak blast similarity (e − 3) to defensin proteins.
Crustins are cationic cysteine-rich AMPs with a leader/signal sequence at the N-terminus and single whey acidic protein (WAP) domain at the C-terminus, and are expressed by the circulating haemocytes of crustaceans 31 . Crustins from the haemocytes of black tiger shrimp (P. monodon) 31 or from freshwater prawn Macrobrachium rosenbergii 32 exhibit potent anti-bacterial activity against several gram-positive and gram-negative bacteria from the environment. A single crustin was detected earlier in the haemocytes of red king crabs together with spider crabs 19 . We found eight typical crustins in the red king crab transcriptome with signal peptides (except two 3′ fragments) and distinct WAP domain. They showed high similarity (e < − 20) to crustins from other crabs and marine species (Fig. 3).
We found one protein with similarity to AMP from pigs-enhancer of rudimentary homolog (AP00208), called peptide 3,910 with antibacterial properties. It is a small protein (103 aa) containing an ER domain (PF01133) and antimicrobial region of 29 aa, defined here as Paralithocin 4.
Furthermore, we identified two proteins containing Kazal-type serine protease inhibitor domains (PF00050), similar to human skin-originated SPINK9 AMPs. SPINK9 is a member of the epidermal antimicrobial peptides for selective killing of E. coli, which might contribute to the innate barrier function of human skin 33 . One defined Anti-lipopolysaccharide factor (ALF) is a small protein with broad-spectrum antimicrobial activity against gram-negative and gram-positive bacteria, and filamentous fungi, which has potential application in disease control. ALF was originally identified from horseshoe crabs and recently found in several shrimp species 34,35 . Different ALFs have a conserved cluster of positively charged residues within their disulfide loop between two conserved cysteine residues, which is usually called lipopolysaccharide (LPS)-binding domain (DUF3254), and considered to be the vital functional domain 34,36 . We defined four ALFs in red king crab. All of them possess signal peptides and typical DUF3254 domain and are quite different in amino acid composition.
Sequences matching to AMPs are often just part of large ORF, which corresponds to a large protein coding gene, e.g. Scolopendin AMP protein (AP02453) found in nearly 30 contigs which contain ORFs of cyclindependent kinase 2-(e.g. TRINITY_DN2174_c0_g1_i1). Similarly, AP02797_cOT2 (29 aa) found in TRIN-ITY_DN90639_c0_g1_i1 is a part of Transferrin protein (Penaeus vannamei) of 350 aa. Thus, short AMPs could be processed from longer peptides. Such genes were also considered as AMP candidates. We defined 19 such large proteins with similarities to YFGAP (Yellowfin Tuna GAPDH-related Antimicrobial Peptide), BPTI (Bovine Pancreatic Trypsin Inhibitor), Scolopendin 1, SJGAP (Skipjack Tuna GAPDH-related AMP), transferrins and some AMPs from trout. Active short AMPs could be produced by posttranslational processing, proteolysis or degradation of such longer peptides 4 , but this should be proved specifically. They are presented in Tables 1 and S1.
To evaluate the defined AMPs, we also ran a prediction tool using the CAMP R3 database 37 , confirming that most of the defined proteins are AMPs (Table S2). At the same time the putative AMP properties of PcPar4 (Paralithocin 4) and PcKaz1 (Kazal-type serine proteinase inhibitor 1), and of two large proteins Scolopendin 1 (DN160752_c0_g1_i1) and BPTI (DN22010_c0_g1_i3) are yet to be confirmed. The underlying reasons could be due to these peptides being novel or weak AMPs. Future study should focus on characterization of these AMPs.
Analysis of selected AMP gene expression. Constitutive expression of selected candidate AMP encoding genes was conducted according to RNA-seq data (Table S3). We tested the RPKM expression of some AMP genes in three independent king crabs with RNAs from carapace, tail flap, leg flesh, and legs containing both shell and flesh (Fig. 4, Table 2). In total we analyzed expression patterns of 27 AMP genes. 19 genes were highly expressed with more than 3,000 reads in all libraries, 5 genes were expressed with around 1,000 reads, and 3 genes had less than 100 reads and were low-expressed. The highest expression was found for Paralithocin 1 and Crustin 3 with more than 8,000 reads. Other paralithocins, ALFs, crustins and ubiquicidins were among medium-expressed genes. www.nature.com/scientificreports/

Conclusions
In conclusion, transcriptome profiling and in silico annotation was very effective to define candidate AMPs in the red king crab transcriptomes. Gene models assembled from transcriptome data will be a valuable resource for future studies, including functional annotation of the red king crab genome. In our study we identify a significant number of AMPs as candidates for future studies of antimicrobial activity. These candidate AMPs will be tested for production of recombinant enzymes in plants or algae and for antimicrobial activity against marine microorganisms (to substitute antibiotics or increase their efficiency for salmon production). RNA was extracted from four tissues from three adult individuals; see Table 2 for sample description. Total RNA was extracted from each sample with the Direct-zol RNA Miniprep Plus kit (Zymo, R2071) according to the manufacturer's instructions after treatment with RNase-free DNase I (Qiagen) to eliminate genomic DNA. The concentration and integrity of the RNA were assessed with a Thermo Scientific NanoDrop 8,000 Spectrophotometer and Agilent 2,100 Bioanalyzer, respectively (Agilent Technologies, USA).

Materials and methods
Transcriptome sequencing and assembly. For library preparation ~ 1 μg of total RNA was used.
Sequencing and library preparation were performed by the Norwegian Sequencing Centre (NSC) in Oslo.
The raw data were trimmed using trimmomatic v0.38. Clean reads were assembled de novo using Trinity v2.8.4 38 , and then redundancy was reduced using cd-hit-est (v 4.7). The resulting graft gene set and uni-genes were preliminarily annotated by looking for similarities to uni-genes in several databases, such as nr, TrEMBL and Swiss-prot, and id mapping to eggnog, Kegg and GO. Full length gene models of the Trinity transcripts were predicted using Augustus v3.3.2 39 .
RPKM was used to identify expression patterns of the selected AMP genes in the different tissues (Fig. 4).
AMP prediction. For prediction of AMPs from transcriptomes of red king crab we used assembled contigs, full length gene models and AMP data from several databases. We screened our comprehensive transcriptome datasets using several public AMP databases and tools 14,40,41  We used two approaches to define AMP candidates. Initially we used protein sequences of obtained full length gene models to blastp (protein-protein BLAST) against AMP database data. This approach was not effective for very short AMP proteins.
In the second approach, we made a tblastn (protein-translated nucleotide sequence BLAST) search of AMPs from all four databases against assembled contigs translated into proteins by all six ORFs using CLC Genomics Workbench 10.0 (https ://www.qiage nbioi nform atics .com/). The tblastn results were filtered with a similarity score ≥ 90. Sequences with observed similarity at the given cutoff values were considered as AMP candidates. Finally, candidate AMP contigs were manually searched for ORFs containing continuous stretches of amino acids of described AMPs, for translation start and stop codons to define the full length or fragment protein sequences. Defined in such a way, proteins were further analyzed for the presence of signal peptides and functional domains. Similarity to known AMPs was additionally confirmed by blasp against the NCBI database and the AMP prediction tool of the CAMP R3 database 37 .