Introduction

Since the completion of the Human Genome Project in 2003, there has been a proliferation of high quality genomic and transcriptomic resources for non-model species1,2,3. The ability to generate these ‘multi-omics’ resources in diverse species has clear comparative4, conservation5,6, and clinical benefits7. As of July 2023, over 5800 non-model animal genomes and 4600 non-model animal transcriptomes are available in the largest global genome repository, the National Center for Biotechnology Information (NCBI)8. Despite the acceleration of genomic resources, only 46 genera within the Order Anura (10% of all frog genera) have genomic or transcriptomic resources available on NCBI8,9. In Australia, despite 92% of frogs being endemic, only 4 of the 229 known endemic frog species currently have ‘omics level resources available9,10,11,12,13.

The global disparity in genetic resources across taxa has been driven in part by the diverse structures of tetrapod genomes and associated difficulties with sequencing and assembly12. While frogs have relatively stable chromosomal structures compared to other tetrapod groups14, they have highly variable genome sizes10,15,16; variable proportions of repetitive elements11,17; and can have ploidy variation18. As a result, the model Xenopus laevis genome was one of the only amphibian chromosome-scale assemblies prior to long-read sequencing19. More recent advances in long read sequencing technology20, alongside chromosome conformation capture techniques21, have since facilitated more contiguous assemblies of these challenging genomes10,22.

An emerging application of genomic resources is the bioinformatic discovery of novel antimicrobial peptides (AMPs). AMPs are small peptides found across all classes of life that form an important part of the innate immune response23,24,25. The comprehensive characterisation of AMPs is valuable in understanding how different species respond to pathogens in their environment and their pleiotropic immune effects have also been explored as potential therapeutics, with several currently undergoing clinical trials26,27. Frogs are known to express AMPs from the exocrine glands of their skin in response to their exposure to diverse, microbial communities in their amphibious, freshwater habitats28,29,30. Over 2500 novel AMPs have been characterised from only 167 frog species on the Database of Anuran Defence Peptides v1.6 (DADP)31. To date, frog AMPs have typically been characterised with a peptidomic approach; the iterative purification of AMPs showing antimicrobial activity from frog skin secretions using high-performance liquid chromatography (HPLC)32. However, this method may not capture the suite of AMPs expressed in other tissues33, nor AMPs with immune functions other than antimicrobial activity34. In response, there has been increased interest in ‘mining’ genomic and transcriptomic resources for the genes that encode for AMPs35. These methods are primarily homology-driven, using known sequences of large, well-characterised AMP families like cathelicidins and β-defensins to search for homologs36. A genomics-driven approach has been applied to discover novel AMPs from a range of non-model species35,37,38,39. However, despite the evolutionary divergence of Australian frog families (Limnodynastidae and Myobatrachidae diverged from South American frogs an estimated 80–100 million years ago)40,41, none have yet been investigated for AMPs using a bioinformatic approach.

The southern stuttering frog (Mixophyes australis)42 is a small, ground-dwelling frog inhabiting naturally vegetated riverbanks in south-eastern Australia, and is part of the Myobatrachidae family of endemic Australian ground frogs43,44. The northern (Mixophyes balbus) and southern (Mixophyes australis) species of stuttering frog were once considered a single species, but were recently split42. Although yet to be officially assessed, application of the International Union for the Conservation of Nature (IUCN) threat assessment methods for the southern stuttering frog warrants a listing of ‘Endangered’ for this newly defined species42. As a ground-dwelling frog, southern stuttering frogs spend most of their life cycle exposed to large bodies of water and decomposing detritus, which may carry pathogens such as chytrid fungus (Batrachochytrium dendrobatidis) and Ranaviruses42,45. We hypothesise that the selective pressures of the southern stuttering frog’s environment make it a good candidate for the discovery of novel and diverse AMPs. Preliminary studies on other Mixophyes species demonstrated some resistance against chytrid fungus in skin secretions46,47. While these secretions have not been sequenced, these studies suggest possible AMP activity in the Mixophyes genus.

In this study, we use a combination of PacBio HiFi long-read, short-read, and Hi-C sequencing data to generate the first genomic and transcriptomic resources for the southern stuttering frog. Using these high-quality resources, we use bioinformatics to characterise and analyse the AMP families cathelicidins and β-defensins. By generating the first ‘omics level resources for the Mixophyes genus, this study provides a unique opportunity to investigate an otherwise largely understudied, Australian taxon.

Results

Multi-omics resources for the southern stuttering frog

Using a combination of PacBio HiFi long-read and Hi-C sequencing data, we generated a high-quality genome assembly for the southern stuttering frog (Fig. 1A). The genome is 3.13 Gbp in length, has a read coverage of 26× and a scaffold N50 of 369 Mbp (Table 1). Repeat-masking the genome revealed a high proportion of repetitive elements (53.7%; Supplementary Table S1), within the broad expected range for Anuran genomes (32.0–77.1%;8). The GC content of the genome, 40.3%, was also within the range of expected Anuran GC values (26.9–44.5%;8). From the assembly, we identified a scaffold representing the mitochondrial genome, which contained 36 genes: 21 tRNAs, 2 rRNAs and 13 protein-coding genes (Fig. 1B).

Figure 1
figure 1

The southern stuttering frog (Mixophyes australis) genome assembly. (A) The 12 longest southern stuttering frog genome scaffolds, ordered by length. Red dots indicate the presence of telomeric sequences at the end of scaffolds. Horizontal, black lines indicate contig joins informed by the Hi-C sequencing data. (B) Circos plot of the southern stuttering frog mitochondrial genome generated using Proksee48. Outer ring contains genes on the forward strand, while the inner ring contains genes on the reverse strand.

Table 1 Statistics generated from the genome and transcriptomes of the southern stuttering frog (Mixophyes australis; this study), the model Xenopus species and three Australian frog genome assemblies.

Over 95% of the genome was assembled in twelve scaffolds (Fig. 1A; Supplementary Fig. S1), which likely represent the chromosomes based on previous karyotypic analysis of other Mixophyes species (2n = 24), including M. hihihorlo50, M. fasciolatus and M. schevilli51. Six scaffolds were flanked at each end by long, telomere-like repeats, while four scaffolds had telomere-like repeats on a single end (Fig. 1A). Additionally, a substantial drop in scaffold length between the 12th and 13th scaffold (94.13 Mbp vs. 4.75 Mbp) was noted, further suggesting 12 chromosome-level scaffolds.

To compare the contiguity and completeness of the genome to other frog genomes, we calculated several genomic statistics, including benchmarking universal single-copy orthologs (BUSCO)49 using the vertebrata_odb10 lineage. Genome statistics were calculated for two model frog species and the three publicly available scaffolded Australian frog genomes (Table 1). Our BUSCO analysis revealed that our assembly contained 91.8% of complete BUSCOs, 2.9% were present but fragmented, while 5.3% were missing. The proportion of complete BUSCOs in the southern stuttering frog is slightly lower, but still comparable to the model reference X. laevis (95.9%). An unbiased assessment of whole genome completeness with merqury52 showed that our genome assembly was highly complete (93.5%) and accurate (Q60+).

Using Illumina short-read RNA sequencing, we generated six reference-guided, tissue-specific transcriptome assemblies (dorsal skin, ventral skin, liver, spleen, brain, and gonads) that were aligned into a global transcriptome. All tissues had high mapping rates to the genome (78.62–89.37%), with both the tissue-specific and global transcriptomes yielding high complete BUSCO scores (84.0–96.1%; Supplementary Table S2). A total of 36,540 genes were annotated using FGENESH++53. Of these, 16,355 were annotated using evidence from the coding regions of the global transcriptome, similar to the 16,279 predicted coding regions provided as input. A further 12,892 genes were annotated via homology to non-redundant, metazoan, proteins from NCBI, and 7293 were annotated ab initio. Proteins from these annotated genes yielded a complete BUSCO score of 86.0%, comparable to the genome annotation of Limnodynastes dumerilii, and lower than the Xenopus spp. annotations (Supplementary Table S2). However, a large proportion of the stuttering frog annotated genes were fragmented (8.5%), similar to L. dumerilii (8.0%).

Characterisation of cathelicidins and β-defensins

We used the genome and transcriptome assemblies of the southern stuttering frog to characterise two families of AMPs: cathelicidins and β-defensins. These families were selected due to their conservation across vertebrate species54,55, highly conserved structures and motifs56,57,58 and demonstrated antimicrobial potency57,59. We identified 12 cathelicidin (MA-CATH1-12) and two β-defensin (MA-BD1 and MA-BD2) genes in the genome using homology-based search strategies. Full-length transcripts of all cathelicidins and β-defensins were identified in the global transcriptome. All putative AMPs contained the characteristic features of each family, including expected exon number (four for cathelicidins and two for β-defensins), conserved amino acid residues, motifs, and domain structure (Supplementary Figs. S2 and S3). As expected, each AMP family was encoded in clusters within the genome; all cathelicidins were located on scaffold 5 and all β-defensins on scaffold 2 (Supplementary Fig. S4). Cathelicidin and β-defensin genes were named in the order in which they are encoded within the genome (Supplementary Table S3; Fig. S4). Amino acid sequences of all AMPs identified in this study, with predicted signal peptide and mature peptide domains are provided in Supplementary Figs. S2 and S3.

Properties, evolutionary relationships and expression patterns of cathelicidins and β-defensins

The mature peptide domains of cathelicidins and β-defensins are the bioactive, antimicrobial portion of the peptide. In the southern stuttering frog, the mature peptides of the cathelicidins ranged between 10 and 161 amino acids in length (Supplementary Fig. S2), while the two β-defensins were both 47 amino acids (Supplementary Fig. S3). The mature peptides of AMPs generally have a high positive charge which facilitates electrostatic interaction and attachment to microbial cell membranes24. Stuttering frog cathelicidins MA-CATH1, 2 and 3 had a high cationic charge of 8.9, 4.9 and 27.3 respectively at pH 7 (Supplementary Table S4). However, several AMPs identified in this study were weakly cationic (charge 0–3), and MA-CATH8 and MA-CATH2 were anionic. Similarly, while a large proportion (10/14) of the characterised AMPs had > 30% hydrophobic residues, another general trend of AMPs60, some peptides like MA-CATH3 had as few as 12.42% hydrophobic residues (Supplementary Table S4). Amphipathicity is often seen in AMPs to permeabilise microbial membranes60. Through visual inspection of Kyte and Dolittle hydropathicity plots, almost all the southern stuttering frog AMPs exhibited amphipathicity, with the N-terminus of the peptides being generally more hydrophilic than the C-terminus (Supplementary Fig. S5). MA-CATH3 exhibited some regions of hydrophobicity but was largely hydrophilic (Supplementary Fig. S5).

As the first AMPs characterised in the Mixophyes genus, we explored their sequence diversity and relationship to other known frog AMPs. All stuttering frog AMPs displayed low percent identity to other known AMPs by BLAST, with MA-BD2 having the highest percent identity of all AMPs in this study to a Chinese spiny frog (Quasipaa spinosa) β-defensin (64.15%) (Supplementary Table S4). As expected, maximum likelihood phylogenetic trees generated using all known frog cathelicidins and β-defensins reflected this diversity, as indicated by the long branch lengths (Fig. 2). MA-CATH1-3 and MA-CATH4-8 formed two species-specific clades within the cathelicidin tree, some of which were strongly supported with > 95% ultrafast bootstrap support (Fig. 2). MA-CATH9-12 clustered within clades containing AMPs from multiple Asian and African frog species, albeit with low bootstrap support (Fig. 2). While Fig. 3 suggests that the southern stuttering frog β-defensins are more closely related than the other β-defensins, this relationship was not well supported.

Figure 2
figure 2

Phylogenetic relationships amongst frog cathelicidins. Tree was generated using the maximum likelihood method, and rooted using four fish cathelicidins as an outgroup. Branches are coloured by ultrafast bootstrap values, with values > 95% in red, and the remaining branches coloured in black. Stuttering frog cathelicidins are coloured green. Cathelicidins are labelled by geographic region; Oceania is green; Asia is blue; North America is pink; Africa is yellow; and Asia and Europe combined is purple. The tree was annotated using MEGA11. For frog cathelicidin sequences used, see Supplementary Table S5.

Figure 3
figure 3

Phylogenetic relationships amongst frog defensins. Tree was generated using the maximum likelihood method and is unrooted. Branches are coloured by ultrafast bootstrap values, with values > 95% in red, and the remaining branches coloured in black. Stuttering frog defensins are coloured green. Defensins are labelled by geographic region; Oceania is green and Asia is blue. The tree was annotated using MEGA11. For frog defensin sequences used, see Supplementary Table S5.

Our comprehensive bioinformatic AMP discovery approach allowed us to explore the expression pattern of AMPs across a range of tissues beyond skin, which is typically the target of peptidomic approaches to AMP discovery. As expected, we observed high AMP gene expression in the dorsal and ventral skin in our specimen. However, we also observed AMP gene expression in other internal organs (Fig. 4). AMP gene expression was found in the liver, spleen, and gonads, with MA-CATH6 and 8 showing the highest expression. MA-CATH10 and MA-BD2 exhibited higher expression in the skin than other AMPs, while some peptides like MA-CATH12 were lowly expressed across all the tissues when compared to other AMPs (Fig. 4).

Figure 4
figure 4

Expression data of the novel cathelicidins (MA-CATH1-12) and β-defensins (MA-BD1-2) from the stuttering frog. Transcripts per Million (TPM) values were derived from the tissue-specific transcriptomes. Note different x-axis scales due to differences in relative expression of AMPs between tissues.

Discussion

Here, we used a combination of PacBio HiFi and Hi-C data to generate the first contiguous, annotated, reference genome for the southern stuttering frog. In addition, we assembled the mitochondrial genome and six tissue-specific transcriptomes that were merged into a global transcriptome. The level of resources generated for this single species is comparable with model Xenopus species, as few frogs have a publicly available nuclear genome, mitochondrial genome, as well as transcriptome. By comparing genome and transcriptome quality metrics between other Australian and model assemblies, we have also determined that these resources are comparable in contiguity and completeness to the model Xenopus assemblies. Additionally, they are the first ‘omics level resources in the Mixophyes genus. As the Myobatrachidae family is one of the oldest, most diverse frog families in Australia40, these resources add to our understanding of Australian fauna.

To demonstrate the insights that can be gained from these genomic and transcriptomic resources, we bioinformatically characterised 12 cathelicidins and two β-defensins, the first Australian frog AMPs to be discovered using this approach. While other peptidomics-based studies describing cathelicidins in frogs have revealed at most two per species61,62, 12 cathelicidins were discovered in this study. Similarly, while one β-defensin has been characterised per species in frogs57,58, two were found in this study. In our preliminary gene expression analysis, some of these AMPs, such as MA-CATH1, MA-CATH10 and MA-BD2, were primarily expressed in the dorsal and ventral skin. However, several others had little to no expression in the skin but were primarily expressed in the liver (MA-CATH6), spleen (MA-CATH7-8), and low levels in the brain (MA-BD1). Due to the endangered threat status of the southern stuttering frog, tissue-specific transcriptomes from multiple specimens were not available to determine if these expression patterns are consistent between individuals. However, the number and preliminary expression patterns of the discovered AMPs implies that the peptides that were not expressed in the skin may not have been identified if the more conventional, peptidomics approach of screening skin secretions was used. Our results reveal the benefits of a genomic-based AMP discovery approach and the need for more amphibian genomic resources to characterise such peptides.

It is possible that the evolutionary isolation of Australian amphibians compared to other characterised species has resulted in the diverse suite of cathelicidins and β-defensins observed. Alternatively, the range of cathelicidins and β-defensins characterised here may suggest that the southern stuttering frog is under stronger microbial exposure than other frogs. A previous study on Diptera (fly) species has shown that AMP diversity and gene duplication is positively correlated with microbial exposure25. This is also likely the case in frogs as they adapt to different global aquatic and terrestrial environments. It is more likely, however, that other frog species and genera may also exhibit a similar number and diversity of cathelicidins and β-defensins which are uncharacterised. The lack of bioinformatic AMP discovery for most frog species makes comparative analysis difficult. Future investigations into frog AMP diversity that incorporate a genomic-based discovery platform will facilitate direct comparisons between species.

Our phylogenetic analysis revealed that some southern stuttering frog cathelicidins formed species-specific sister clades to those containing cathelicidins from frogs of Europe, North America, and Asia (Fig. 2). This may indicate that these particular cathelicidins were the result of gene duplication events that occurred after the southern stuttering frogs diverged from the other frog species within the phylogenetic tree. Indeed, several southern stuttering frog cathelicidins have the same signal and cathelin region, but a different mature peptide. Other southern stuttering frog cathelicidins clustered with cathelicidins from European and African frog species (Fig. 2). This suggests these cathelicidins may be the result of gene duplication events in a more distant common ancestor. However, due in part to the great variability in AMP sequences, many of the relationships identified were not strongly supported, particularly for β-defensins, limiting the validity of insights drawn from these trees. It also remains difficult to ascertain whether the species-specific clusters of southern stuttering frog cathelicidins are truly unique to M. australis or the Mixophyes genus more broadly, as there are currently no other characterised cathelicidins from Mixophyes frogs. The number of known cathelicidins and β-defensins across the Anuran order is limited, with only one cathelicidin characterised in a European frog, and no known cathelicidins or β-defensins from South American frog species. As the Myobatrachidae family shares a distant common ancestor with South American frogs40, evolutionary relationships within AMP families across these geographical regions are likely not captured. As more frog cathelicidins and β-defensins are characterised, in particular from Australia and South America, future investigations may better identify the evolutionary patterns of AMP diversity across Mixophyes and other frogs.

The characterised and predicted properties of the stuttering frog AMPs suggest that they may play diverse immunological roles. AMPs are generally cationic and amphipathic due to their electrostatic interactions with anionic glycolipids on prokaryotic membranes, which facilitate membrane permeability and cell lysis63,64. However, AMPs can also exhibit diverse activities beyond antimicrobial activity and may serve other immune functions. For example, an anionic cathelicidin from a salamander species (TK-CATH) had no tested antimicrobial activity, but instead inhibited pro-inflammatory cytokine gene expression when added to mammalian macrophage cell lines65. Two of our AMPs were anionic (MA-CATH8 and MA-BD2; Supplementary Table S4), suggesting that they may have other immune functions. Future investigations will need to validate these in silico findings, such as by synthesising these AMPs and investigating their effects on immune gene expression in a range of cell lines.

While our bioinformatic characterisation of novel AMPs has demonstrated one application of the newly generated genomic and transcriptomic resources, there are numerous other potential applications. For instance, custom DNA metabarcoding markers generated from the mitochondrial genome can now be developed for the southern stuttering frog, contributing to applied conservation outcomes in this threatened species. Metabarcoding has been extensively used to characterise the biodiversity of different environments66. Mitochondrial metagenomics (mtMG) has been previously used to distinguish between closely related species of nematodes67. As the southern stuttering frog is a recently defined species, closely related to the northern population of stuttering frogs (M. balbus)42, the mitochondrial genome generated in this study may be a useful monitoring tool, particularly in defining their range and overlap if applicable. Highly contiguous genomes and transcriptomes can also be used to characterise genomic regions that have high repeat content, such as the major histocompatibility complex (MHC)68. In frogs, the upregulation of MHC class I and II genes in the Montane brown frog (Rana ornativentris) has a functional role in tadpole development69. MHC heterozygosity is also a significant predictor of chytrid fungus resistance in the Lithobates genus70,71. Finally, these resources may be used comparatively with other genomes to investigate conserved and specialised traits across taxa. For example, emerging consortia like the Zoonomia Project have made significant progress in advancing our understanding of mammalian adaptations and evolutionary history72. Synteny analyses have previously been conducted in amphibians, but the variation in sequencing quality and gene annotation methods across the limited existing genomes has made deriving insights difficult73. Comparative studies in Anurans using high-quality genomic resources from representative genera would advance our understanding of a wealth of unique traits; some frog species can survive in extreme temperatures and environments74,75, produce a myriad of toxins76, and regenerate lost appendages as tadpoles77. The incorporation of the southern stuttering frog genome in future studies into amphibians will facilitate a better representation of evolutionarily unique Australian biodiversity in these investigations. In short, generating high-quality multi-omics resources facilitates a plethora of investigations into the southern stuttering frog and amphibians at large.

Methods

Sampling, extractions and sequencing

A wild caught, adult, male stuttering frog was medically euthanised in September 2021 (32° 59′ 52.4″ S 151° 24′ 27.3″ E). Heart and kidney tissue was flash frozen in liquid nitrogen for DNA extraction. Gonads, brain, dorsal skin, ventral skin, liver, and spleen tissue were stored in RNAlater at − 80 °C until RNA extraction. Lethal sampling was conducted under the University of Newcastle Animal Care and Ethics Committee (ACEC Number A-2013-339) and NSW scientific licence (SL190). All methods were performed in accordance with relevant guidelines and regulations; animal research was conducted in compliance with the ARRIVE guidelines.

DNA was extracted from heart and kidney tissue using a Nanobind Tissue Big DNA Kit (Circulomics). Total extracted DNA was verified to be > 20 μg through Qubit fluorometric quantification (ThermoFisher Scientific). DNA was pooled and sequenced at the Australian Genome Research Facility (Brisbane, Queensland, Australia) using a SMRTbell® prep kit 3.0 (PacBio), and circular consensus sequencing (CCS) was performed using three SMRT cells on a PacBio Sequel II system. For Hi-C sequencing, heart and kidney tissue were washed twice for 5 min with 1 × PBS using a rotator wheel at room temperature. Tissues were sequenced at the Biomolecular Resource Facility (Canberra, ACT, Australia), using the Arima Hi-C kit and sequenced as 150-bp paired-end (PE) reads on an Illumina NovaSeq 6000.

RNA was extracted from the six tissues using a Qiagen RNeasy Mini Kit. Concentrations of each sample were confirmed to be ≥ 25 ng/µl using a Nanodrop spectrophotometer (ThermoFisher Scientific) and the RNA integrity number (RIN) measured using the standard Agilent RNA 6000 Nano Kit Protocol and BioAnalyzer (Agilent Technologies). Extractions < 7 RIN were not sequenced. Extracted RNA was prepared at the Ramaciotti Centre for Genomics (Sydney, NSW, Australia), using the Illumina Stranded mRNA Prep Protocol and sequenced as 100-bp PE reads on an Illumina NovaSeq 6000 S1 flowcell.

De novo genome assembly

Over 15 million raw HiFi reads were generated from three SMRT cells. To prevent low quality reads introducing errors to the assembly, reads with Phred (Q) quality score < 20 were filtered out using bamtools v2.4.178. Reads containing adapter sequence were removed by HiFiAdapterFilt v2.0.079 with default parameters. Further details on the computational requirements and estimated run times for all bioinformatic analyses are provided in Supplementary Table S6. Hifiasm80 assembled the remaining reads, alongside paired Hi-C reads, into contigs. To merge the contigs into a scaffolded assembly, we first mapped the Hi-C reads to the unscaffolded genome following the Arima Hi-C mapping pipeline (A160156 v02; https://github.com/ArimaGenomics/mapping_pipeline). YaHS81 was used to merge contigs containing complementary pairs of reads, and the contact map visualised with Juicebox82. One misassembly in the first scaffold was manually corrected. We used MitoHiFi v3.2 to identify and annotate the mitochondrial genome using a closely-related mitogenome as input (Lechriodus melanopyga; NC_019999.1)83,84.

Genome statistics (e.g., N50 and L50 values), were calculated with bbmap v38.86 (https://sourceforge.net/projects/bbmap/). We identified regions matching canonical telomere hexamer repeats (TTAGGG/CCCTAA) using FindTelomeres (https://github.com/JanaSperschneider/FindTelomeres). Benchmarking Universal Single-Copy Orthologs (BUSCO) v5.3.249 analysis was performed on Galaxy Australia85, using the ‘genome’ mode, applying the ‘augustus’ gene-finding setting, and with the vertebrata_odb10 lineage. To compare the gene completeness of the stuttering frog with other genomes, BUSCO analysis was also performed with the same settings on two model frog genomes, the African clawed frog (Xenopus laevis; GCA_017654675.1) and the western clawed frog (Xenopus tropicalis; GCA_000004195.4), as well as the three publicly available scaffolded Australian frog genomes, the southern banjo frog (Limnodynastes dumerilii; GCA_011038615.1), ornate burrowing frog (Platyplectrum ornatum; GCA_016617825.1), and corroboree frog (Pseudophryne corroboree; GCA_028390025.1). An alternative assessment of genome completeness, inclusive of non-coding and repetitive regions, was performed for the stuttering frog and P. ornatum genomes using Merqury v1.352. RepeatModeler v2.0.186 was used to generate a de novo database of the repetitive regions in the stuttering frog genome. We characterised repeats and masked the genome with RepeatMasker v4.0.687. The repeat-masked genome was indexed with hisat2 v2.1.088.

Reference-aligned global transcriptome assembly

Over 900 million PE reads were generated across the six tissue transcriptomes. Low quality sequence calls, adapter and primer sequences were trimmed using Trimmomatic v0.3989 with ILLUMINACLIP:TruSeq3-PE.fa:2:30:10, SLIDINGWINDOW:4:5, LEADING:5, TRAILING:5, and MINLEN:25 settings. The reads from each tissue sample were aligned to the genome with hisat2 v2.1.088. StringTie v2.1.690 was used to merge aligned reads into tissue-specific transcriptomes. We used Transcriptome Annotation by Modular Algorithms (TAMA) v1.091 to combine the tissue-specific transcriptomes into a global transcriptome with adjustments to minimise duplicate transcripts. Briefly, the ‘-d merge_dup’ flag was applied to merge identical transcripts, and the ‘-z 500’ flag was applied to facilitate transcripts with variable 3’ ends (differences of up to 500 bp) to be merged. Transcripts with weak evidence were removed, including transcripts that were found in only one tissue and were lowly expressed (fragments per kilobase of transcript per million fragments mapped [FPKM] < 0.1). CPC2 v2019-11-1992 was used to predict coding regions; non-coding regions were removed. Open reading frames were predicted by TransDecoder v2.0.193. BUSCO v5.3.249 analysis was performed using ‘transcriptome’ mode for both tissue-specific transcriptomes and the global transcriptome, as well as the X. tropicalis transcriptome for comparison.

Genome annotation

The genome was annotated with FGENESH++ v7.2.253 informed by the global transcriptome. Non-mammalian settings were applied throughout the pipeline, and parameters were optimised for Anuran gene discovery by providing the Xenopus gene-finding matrix (Softberry). BUSCO v5.3.249 analysis was performed on the annotation using ‘protein’ mode, as well as for the X. laevis, X. tropicalis and L. dumerilii annotations. The number of genes, exons and introns from the stuttering frog annotated assembly was calculated using the ‘genestats’ script (https://github.com/darencard/GenomeAnnotation/blob/master/genestats).

AMP characterisation

To ensure a comprehensive search of genomic regions that could encode cathelicidins or β-defensins, the annotated genome and global transcriptome were queried using Basic Local Alignment Search Tool (BLAST) v2.2.30+94,95 and HMMER v3.396. A dual program approach has previously been applied in other bioinformatic searches for cathelicidins and β-defensins97,98,99. Further explanation of the approaches used is provided in the Supplementary Extended Methods.

AMP phylogeny

An unrooted phylogenetic tree was generated for the cathelicidin and β-defensin gene families. We first aligned amino acid sequences for each gene family through ClustalW alignments. Both alignments included the full prepropeptide sequences from the stuttering frog and other available frogs (Supplementary Table S5). The cathelicidin alignment also included four fish cathelicidins as an outgroup (Supplementary Table S5). The best-fitting substitution models for each alignment were selected using the ModelFinder option in IQ-TREE v2.2.0100 according to the Bayesian information criterion101. For the cathelicidin alignment, the Jones Taylor Thornton (JTT) substitution model102 was optimal, incorporating Invariant Sites (+ I) and four components of gamma rate heterogeneity (+ G4). For the β-defensin alignment, the Dayhoff model was optimal103. Maximum likelihood analysis was conducted in IQ-TREE v2.2.0100 with node support estimated by ultrafast bootstrap approximation with 1000 replicates104. The results were visualised and annotated using MEGA11105.

Characterisation and prediction of AMP structure, properties and expression

Cathelicidins are made up of a signal peptide, a conserved cathelin pro-region, as well as the mature, bioactive peptide56,106. To predict the signal peptide from the full prepropeptide sequence, the SignalP v6.0 webserver was used107 with the ‘Eukarya’ organism setting and the ‘Slow’ model mode. While the enzyme used to cleave the mature peptide from the cathelin pro-region is not known for amphibians, there is experimental evidence to support proprotein convertases or trypsin-like proteases that cleave at dibasic residues like lysine (K) and arginine (R)61,108,109. Therefore, a two-tiered approach predicted the mature peptide of the novel cathelicidins. Proprotein convertase cleavage sites were first predicted from the last exon of the cathelicidins using ProP v1.0110. If no cleavage was identified, trypsin cleavage sites were predicted using ExPASy’s peptide cutter tool111. For β-defensins, which generally consist of a signal peptide and a mature, bioactive peptide112,113,114, the signal peptide was predicted using SignalP, and the remaining peptide was annotated as the mature peptide.

The molecular weight and charge at pH 7 for each putative stuttering frog cathelicidin and β-defensin was calculated using Protein Calculator v3.4 (https://protcalc.sourceforge.net/). The percentage of hydrophobic residues in the AMPs was calculated on Peptide v2.0 (https://www.peptide2.com/) using the ‘Peptide Hydrophobicity/Hydrophilicity Analysis’ tool. The amphipathicity of the AMPs was determined by generating Kyte and Doolittle hydropathicity plots on ExPASy using Protscale111, with a window size of 5. These plots were inspected for the presence of ‘peaks’ and ‘troughs’, which indicate sections in the AMP of high and low hydropathicity115. For the expression analysis, the Transcripts Per Million (TPM) values for each AMP was generated from the tissue-specific transcriptomes using StringTie v2.1.690.