To further comprehend the genome features of Cephalloscyllium umbratile (Carcharhiniformes), an endangered species, the complete mitochondrial DNA (mtDNA) was firstly sequenced and annotated. The full-length mtDNA of C. umbratile was 16,697 bp and contained ribosomal RNA (rRNA) genes, 13 protein-coding genes (PCGs), 23 transfer RNA (tRNA) genes, and a major non-coding control region. Each PCG was initiated by an authoritative ATN codon, except for COX1 initiated by a GTG codon. Seven of 13 PCGs had a typical TAA termination codon, while others terminated with a single T or TA. Moreover, the relative synonymous codon usage of the 13 PCGs was consistent with that of other published Carcharhiniformes. All tRNA genes had typical clover-leaf secondary structures, except for tRNA-Ser (GCT), which lacked the dihydrouridine ‘DHU’ arm. Furthermore, the analysis of the average Ka/Ks in the 13 PCGs of three Carcharhiniformes species indicated a strong purifying selection within this group. In addition, phylogenetic analysis revealed that C. umbratile was closely related to Glyphis glyphis and Glyphis garricki. Our data supply a useful resource for further studies on genetic diversity and population structure of C. umbratile.
Cephalloscyllium umbratile (Cephaloscyllium, Scyliorhinidae, Chondrichthyes), belonging the Carcharhiniformes order, is one of the most important aquarium and reef fish, and mainly distribute in the coastwise of China, Vietnam and Japan. Due to small amount, it is regarded as endangered species, and absorbed in red list of International Union for Conservation of Nature (IUCN)1. Since the information about C. umbratile has been generally scarce, with the development of offshore fishery, increasing research interest has been developed in conservation as well as in scientific and economic topics regarding reef fish2,3.
In Chondrichthyes, the typical complete mitochondrial DNA (mtDNA) was circular and approximately 17 kb in length with correspondingly conserved gene content which encoded 37 genes, including 22 transfer RNA (tRNA), 13 protein-coding genes (PCGs), 2 ribosomal RNA (rRNA), a major non-coding control region (D-loop region), and an A + T-rich region4,5. Furthermore, genomic information is considered to be reliable for the efficient implementation strategies to study evolutionary relationships, phylogeography and phylogeny6,7. Due to its conserved gene content, maternal inheritance, a small genome size, relatively fast evolutionary rate, high copy number and lack of intermolecular genetic recombination8,9,10, mtDNA has been broadly adopted in species identification11,12, genome evolution13,14,15,16 and nonsynonymous (Ka) and synonymous (Ks) substitutions of many species17,18,19,20,21,22,23.
Moreover, Carcharhiniformes include about 49 genera and over 200 species, and many of them are important economic categories. Nevertheless, several evidences gathered with genome synteny analysis have revealed a number of shared unique mitochondrial gene features in Chondrichthyes, towards a better understanding of the functions and evolution of Chondrichthyes24,25,26,27. So far, there was still a notably lack of mtDNA information in Carcharhiniformes. In order to provide a theoretical foundation for the conservation strategy of C. umbratile within Scyliorhinidae and new sight for further studies of phylogenetically-informative sequence data, in the current study the complete mtDNA of C. umbratile was sequenced, assembled and annotated, and compared with other members of Carcharhiniformes.
Results and Discussion
Genome size and organization
About 1.5 G raw data is generated with reads length 125 bp. Sequencing coverage and depth (X) of mtDNA data is 100% and approximately 394.23, respectively. Reads number is 52,660 and total bases (bp) is 6,582,500. The mtDNA of C. umbratile was a closed-circular DNA molecule of 16,697 bp in length (GenBank: KX354996; Fig. 1, Table 1), which was comparable to other Carcharhiniformes mtDNA ranging from 16,697 bp in Scyliorhinus canicula 25 to 16,719 bp in Carcharhinus acronotus 28. Nucleotide BLAST (blastn) of the whole C. umbratile mtDNA against other Carcharhiniformes revealed sequence identities with closely related species of 88% (S. canicula), 84% (Proscyllium habereri), and 84% (Pseudotriakis microdon) and with distantly related species of 82% (Scoliodon laticaudus), 82% (Hemigaleus microstoma), 82% (Hemipristis elongata) (Supplementary Table 1). The mtDNA of C. umbratile contained 2 rRNA genes, 13 PCGs, 22 tRNA genes and D-loop region. The arrangement of the genes was identical to that of other Scyliorhinidae mtDNA (Table 1)29,30. Among these genes, 29 genes (12 PCGs, 2 rRNA genes and 15 tRNA genes) are located on the heavy strand (H-strand) and the others (1 PCGs and 8 tRNA genes) are located on the light strand (L-strand) (Table 1). These obvious features have also been reported in other Carcharhiniformes species31,32 and could be regarded as effective markers for authentication at genus and species level.
The nucleotide composition of the mtDNA is biased toward A + T nucleotides (52.9%), which made up of 61.8%, 61.4%, 61.5% and 68.7% in the PCGs, tRNA, rRNA and D-loop region, respectively (Table 2). However, the A + T nucleotide composition in C. umbratile was the lowest among Carcharhiniformes. The positive AT skew (0.025) observed here with the presence of more As than Ts, was similar to that only in Sphyrna tiburo (0.031), nevertheless, mtDNA in majority of Carcharhiniformes showed negative AT skew (Table 2). The GC skew ranged from −0.324 in S. tiburo to 0.040 in C. macloti (Table 2). The C. umbratile mtDNA was negative (−0.245), indicating the presence of more Cs than Gs.
Protein-coding gene features
The PCG region formed 68.5% of the C. umbratile mitogenome, and was 11,440 bp long. Furthermore, a contrast of nucleotide composition, AT-skew, and GC-skew of Carcharhiniformes PCGs were exhibited in Table 2. A + T content of the rRNA genes was 61.75%. The AT skew value (−0.282) of the PCG region in the C. umbratile mtDNA was higher than that of several reported mtDNA, nevertheless the negative GC skew (−0.069) was similar to that observed in other fish33,34.
Each PCG was initiated by a canonical ATN codon, except for COXI, which was initiated by a GTG codon (Table 1). Similar results have been documented in other Carcharhiniformes35,36. Seven of 13 PCGs (ND1, COXI, ATP8, COXIII, ND4L, ND5, ND6) used a typical TAA termination codon, which was typical for Carcharhiniformes mtDNA35,36; whereas COXII, ND3 and ND4 terminated with a single T and ATP6, ND2 and Cytb terminated with TA (Table 1). It was akin to sequenced mtDNA of Carcharhiniformes, including Triaenodon obesus 37, Carcharhinus macloti 38, Mustelus griseus 39, S. canicula 25 and C. acronotus 28.
A total of 3,803 amino acids of PCGs are encoded in C. umbratile. In addition, the codon usage is shown in Table 3. The most frequent amino acids in the PCGs of C. umbratile were Leucine (17.3%), Isoleucine (9.02%) and Alanine (7.45%) (Table 3). Relative synonymous codon usage (RSCU) analysis of PCGs in C. umbratile revealed that the codons encoding Leu, Thr, Ala, Arg, Gln, Gly, Pro and Ser were the most frequently present, nevertheless those encoding Asn, Asp, Cys and Lys were rare (Fig. 2). In the PCGs of the eight species examined, codon distributions and amino acid content were corresponding among species (Fig. 3). It was declared that conserved amino acid sequences were present among those fish28,32,40. Moreover, codons with A or T in the third position were overused in comparison to other synonymous codons, for example, the codons for glutamine CAG and GAG were rare, while the synonymous codons CAA and GAA were prevalent (Fig. 4), which is consistent with previous observations of Carcharhiniformes36.
Transfer RNAs and ribosomal RNAs
The representative complement structures of 22 tRNAs were identified in the C. umbratile mtDNA, ranging from 62 bp (tRNAThr) to 76 bp (tRNALys)35,36 for 1,538 bp in total (Table 1). Of those, the highest A + T content of tRNAs was S. canicula and the lowest was C. sorrah. Fifteen tRNA genes were encoded on the H strand while the remains were located in the L strand (Table 1). The overall A+T content of tRNAs was 61.38% which was approximate to that observed in Loxodon macrorhinus (61.4%). The negative AT skew (−0.098) and positive GC skew (0.059) showed in the C. umbratile mtDNA were also analogous with several sequenced Carcharhiniformes (Table 2).
The forecasted tRNAs were shown in Fig. 5. All of the tRNAs could be folded into classic clover-leaf secondary structures in C. umbratile, except for tRNA-Ser (GCT), which lacked the dihydrouridine ‘DHU’ arm (Fig. 5). The ‘DHU’ arm of this tRNA was a large loop instead of the conserved stem-and-loop structure. Due to a representative characteristics41, it was also observed in other Chondrichthyes mtDNA, including Chiloscyllium griseum 42 T. obesus 37 and so on. Fifteen of the tRNA genes were each observed to have at least one G-T mismatches in their respective secondary structures, which forming a weak bond. Five T-T mismatches were present in the respective amino acid acceptor stems of tRNA Asp(GTC), tRNA Cys(GCA), tRNA His(GTG), tRNA Ile(GAT) and tRNAMet(CAT) (Fig. 5). Interestingly, A-G mismatch was also present in tRNA-Leu (TAA). Unmatched base pairs perceived in tRNA sequences can be amended by RNA-editing mechanisms that were well known for vertebrate mtDNA43.
The A + T content of the rRNA genes was 61.46%, indicating an A+C-rich trend as in other Scyliorhinidae fish25. AT and GC skews were negative (−0.082) and positive (0.132), respectively (Table 2). The 12S rRNA and 16S rRNA subunit gene of C. umbratile was 954 bp and 1,668 bp in length, respectively. As in other vertebrates44, both two genes are separated by the tRNA Val gene, and located between tRNA Phe and tRNA Leu(UUR) (Fig. 1, Table 1). The overall content of the rRNA was analogous to that observed for other Carcharhiniformes.
The control region
The length of D-loop region of C. umbratile was 1,059 bp, which was less long than majority of Carcharhiniformes. The A + T content was 68.65%, and equal with other Carcharhiniformes (Table 2), which was consistent with the findings of previous reports on other teleosts33,45,46. Moreover, both of the AT-skew and GC-skew were strongly negative (Table 2).
Overlapping and intergenic spacer regions
There were three gene boundaries where bases overlapped between adjacent genes, ranging from 4–22 bp in size. The longest overlapping region was 22 bp between ATP8 and ATP6 (Table 1) which has been documented in several other Chondrichthyes mtDNA4,25,32. Moreover, intergenic spacers of C. umbratile were spread over 12 locations and ranged from 1–36 bp, making up 60 bp in total, and the longest intergenic spacer region (36 bp) was between tRNA Asn and tRNA Cys (Table 1).
Synonymous and nonsynonymous substitutions
The ratio of Ka/Ks is generally regarded as a pointer of selective pressure and evolutionary relations at the molecular level among homogenous or heterogeneous species47,48. It is reported that Ka/Ks > 1, Ka/Ks = 1, and Ka/Ks < 1 popularly declared positive selection, neutral mutation and negative selection, respectively49. To investigate the evolutionary rate differences in three Carcharhiniformes mtDNA (C. umbratile, S. canicula and P. habereri), sequence divergences by counting Ka and Ks substitution rates were next calculated. The Ka/Ks values of 13 PCGs varied from 0.0198 (COXI) to 0.5322 (ATP8) and were less than 0.6 (Ka was lower than Ks) for all other genes which indicated a strong purifying and negative selection in those fishes (Fig. 6). Our result of the Ka/Ks ratio illustrated that the multitudinous genes evolved under strong negative selection which meant natural selection against profitless mutations with negative selective coefficients50. The percentages of variable sites of SC/PH were the highest in COXIII and ND1 among the groups, while the percentages was the least in COXI gene, which indicated that COXIII and ND1 were under the least selective pressure, and COXI was under the most selective pressure among all mitochondrial proteins. In C. umbratile and S. canicula, the ratio of Ka/Ks was the least in all 13 protein-coding genes compared to P. habereri, implying that these two Scyliorhinidae fish had the closer phylogenetic relationship than P. habereri, which was consistent with their rozmieszczenie naturalne and ecological habit25.
To understand the phylogenetic relationships among Carcharhiniformes, base on Maximum Likelihood (ML), Neighbor Joining (NJ) and Bayesian Inference (BI) methods, a dataset of 25 species containing the concatenated nucleic acid and amino acid sequences of 13 PCGs was used to generate phylogenetic relationships (Fig. 7). The topologies of the 6 phylogenetic trees were analogical in our study. The results implied that strong statistics supported for the following relationship among the 5 Superfamily (Scyliorhinidae, Carcharhinidae, Hemigaleidae, Proscylliidae, Pseudotriakidae) (Fig. 7A,B). This clustered pattern of 5 Superfamily was broadly consistent with previous studies32,42,51,52,53. Furthermore, based on all of ML, NJ and BI methods, 5 superfamily divided into 13 closely genera, and C. umbratile (Cephaloscyllium) was most closely related to S. canicula (Scyliorhinus) in Scyliorhinidae, which was accord with the tendency of nucleotide sequence identity and a recent study51,54,55,56,57. Scyliorhinidae was most closely related to Proscylliidae. Additionally, further taxon sampling within Scyliorhinidae and related superfamilies is required to resolve the location of Scyliorhinidae in Carcharhiniformes.
Materials and Methods
Sample collection and mitochondrial DNA extraction
C. umbratile juveniles were collected from South China Sea (Longitude 5°20.267′ N and latitude 109°48.435′ E) in September 2014 and directly frozen. Muscle tissues were used for DNA extraction according to the Genomic DNA Extraction Kit’s instructions (TaKaRa MiniBEST Universal Genomic DNA Extraction Kit Ver.5.0, Japan). The quantity (concentration) of isolated total DNA was determined by NANODROP 2000 spectrophotometer (Thermo Scientific, USA). Furthermore, quality of extracted DNA was assessed by electrophoresis on a 1% agarose gel stained with Gel Red™ (Biotium).
According to NEBNext DNA sample libraries kit (NEB, New England)‘s instructions, the normalized DNA (4 μg) was used to structure the paired-end library. Size and quantification estimation of the library were implemented by a Bioanalyzer 2100 High Sensitivity DNA chip (Agilent, USA). Illumina HiSeq. 2500 (2 × 101 bp paired-end reads) (Illumina, USA) was used to sequence the normalized library (2 nM).
Genome assembly and annotation
A de novo assembly of the paired-end HiSeq reads was performed using SeqMan NGen (http://www.dnastar.com/t-tutorials-seqman-ngen.aspx) (DNASTAR Inc., Madison, WI, USA)58. Assembly parameters minimum match percentage, match spacing, match size, gap penalty, mismatch penalty, maximum gap length and expected genome length were set to 93, 10, 50, 30, 20, 6% and 16,000, respectively. Accordance sequence was exported and ends were manually edited to remove duplicated nucleotides. Subsequently, the assembled sequences were aligned to NCBI nt database with blastn method (https://blast.ncbi.nlm.nih.gov/). Sequences that mapping to Carcharhiniformes mtDNA were considered as C. umbratile mtDNA. To verify the accuracy of the assembled mtDNA sequence, the primers (Supplementary Table 2) were used to amplify the genome sequence. The procedure of PCR amplification was referred from Sun et al.59. To determine whether this method was accurate, the sequence segments of same genomic region obtained from Sanger sequencing and shotgun assembly were compared. If they were identical, that meaning this method was precise. Moreover, the PCGs, rRNA genes, tRNA genes and D-loop region of mtDNA were annotated by MitoAnnotator (http://mitofish.aori.u-tokyo.ac.jp/annotation/input.html)60 with parameters of complete circular genome. The mtDNA sequence of C. umbratile has been deposited in the GenBank database under accession numbers KX354996.
Genome sequence analysis
tRNAscan-SE Search Server 1.21 program was used to primordially determine Transfer RNAs61,62. The gene map of C. umbratile mtDNA was built by OGDRAW1.2 and embellished manually63. The strand skew values were reckoned in terms of the formulae by Perna and Kocher (1995)64. The mode of “models- > Compute Codon Usage Bias” was chose to obtain RSCU in MEGA 6.065. To determine the evolutionary branching of the Carcharhiniformes lineage, codon usage in the 13 PCGs and the rates of Ka/Ks substitutions in the mtDNA of Carcharhiniformes were calculated by DnaSP 5.10.0166. To describe base composition, we analyzed skew as described as below: AT-skew = (A − T)/(A + T) and GC-skew = (G − C)/(G+C)67.
To discuss the phylogenetic position of Carcharhiniformes, a total of 25 species of 13 PCG sequences were used to perform phylogenetic analysis, including those of C. umbratile. Alignments of the 13 concatenated PCGs nucleotide and amino acid sequences were conducted using ClustalX version 2.0 with default parameters68. Phylogenetic analyses for each concatenated dataset was performed using ML, MP and BI methods with MEGA 6.0 and Phylobayes 3.3 f, respectively65,69. The methods of ML and MP analysis were performed with GTR+I+G model and Subtree-Purning-Regrafting (SPR) model using MEGA 6.0, respectively. The evaluation of node accuracy was done by using 1,000 bootstrap replicates in MEGA 6.0 with default parameters. Furthermore, BI analysis was selecting the CAT-GTR model, two independent Markov chain Monte Carlo (MCMC) chains were run for 10,000 cycles. The phylogenetic tree was embellished using FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the Special Scientific Research Funds for Central Non-profit Institute, Chinese Academy of Fishery Sciences (2016HY-JC0304), the Science and Technology Infrastructure Construction Project of Guangdong Province (2014A030305005, 2015A030303008), National Infrastructure of Fishery Germplasm Resource Project (2017DKA30470).