Characterization of the mitochondrial genome of Megalobrama terminalis in the Heilong River and a clearer phylogeny of the genus Megalobrama

Megalobrama terminalis distributed in Sino-Russian Heilong-Amur River basin has decreased dramatically in the last few decades. It has been listed in the Red Book of the Russian Federation as an endangered fish species. Here, the complete mitochondrial (mt) genome sequence of M. terminalis in the Heilong River (MTH) was first determined and characterized. Additionally, we identified a population-specific single nucleotide polymorphism (SNP) locus in MTH which could effectively separate MTH from the six other populations of the genus Megalobrama in the absence of hybridization. Moreover, phylogenetic analyses determined that the Xi River M. hoffmanni is located at the basal branch of the clade, and the rest of the group is divided into two assemblages, namely, one containing M. terminalis from Qiantang River and Jinsha River Reservoir/Longxi River M. Pellegrini/MTH and the other containing M. amblycephala from Liangzi Lake and Yi River. We clarify the intraspecies identity of MTH and construct a clearer phylogeny of the genus Megalobrama, which will contribute to the germplasm identification, protection and development of MTH in the future.

The genus Megalobrama belongs to Cultrinae in Cyprinidae, and is one of the most economically important fish genera in China. This genus contains four main species: M. amblycephala, M. pellegrini, M. hoffmanni and M. terminalis 1,2 . The most widely distributed species among these is M. terminalis. Historically, M. terminalis inhabited the middle and upper reaches of the Yangtze, Heilong River Basin, Min, Qiantang, Huai, Yellow and Liao Rivers 1,3,4 (Fig. 1). Notably, there is only one wild population in the Jinsha River Reservoir of Hongan County in Hubei Province 5 , but its origin is unclear.
To date, most of the wild populations of M. terminalis that lived in the original habitats have disappeared except for those in the Qiantang River, Heilong River and Jinsha River Reservoir (Fig. 1). The Heilong River is a Sino-Russian border river, and is known as the Amur River in Russia. M. terminalis in the Amur River has been listed as endangered in the Red Book of the Russian Federation 6 . In the Heilong River Basin, M. terminalis is called Faluo and lives in several water systems 7 , including the Heilong River, Songhua River, Nen River, Jingpo Lake, and Khanka Lake ( Fig. 1) 3,4 . Faluo is a famous endemic fish species with several desirable qualities, such as a large body size, high intramuscular fat content and delicious taste. Since the 1960s, however, overfishing and River; JSR, Jinsha River Reservoir; HR, Huai River; YER, Yellow River; LR, Liao River; HLR, Heilong River (Fuyuan); AR, Amur River; SHR, Songhua River; NR, Nen River; JPL, Jingpo Lake; KL, Khanka Lake. A red dot represents that wild M. terminalis in the location has disappeared. A green dot indicates that wild M. terminalis in the location is still present. and mt R1) described by Huang et al. 21 was used to amplify the D-loop region. Twenty-one other pairs of primers (Table 1) with putative overlapping amplification regions were designed using Primer 5.0 (Premier Biosoft International, Palo Alto, CA, USA) according to the complete mt genome sequences of species in the genus Megalobrama available in GenBank. The PCR conditions were as follows: 94 °C for 5 min, followed by 30 cycles at 94 °C 30 s, 59 °C for 15 s, and 72 °C for 45 s, and an extension at 72 °C for 5 min. PCR components contained 13.7 µL H 2 O, 2 µL 10 × PCR buffer, 1.6 µL 1.5 mM MgCl 2 , 0.6 µL of each primer (10 µM), 0.4 µL 10 mM dNTPs, 0.1 µL Taq DNA polymerase (5 U/µL) and 1 µL DNA template (50 ng/µL) in a total volume of 20 µL. PCR products were separated on 1.0% of agarose gels with 0.5 µg/mL of ethidium bromide. The DNA fragments were purified using an E. Z. N. A. Gel Extraction Kit (Omega Bio-Tek, Norcross, GA, USA) and subcloned into pMD18-T vectors (TaKaRa, Shiga, Japan). Recombinant clones containing each target fragment were sequenced on an ABI 3700 sequencer (Applied Biosystems, PerkinElmer, Foster City, CA, USA).  www.nature.com/scientificreports www.nature.com/scientificreports/ Mitochondrial genome assembly and sequence comparison. Sequencing data and the boundaries from each segment with overlapping domains were checked to ensure authenticity. The assembled sequences were annotated by comparisons with known complete mitochondrial genomes of closely related species, including MPL and MHX. The secondary structures of tRNAs were identified using the web-based tRNA-scan SE 2.0 program (http://lowelab.ucsc.edu/tRNAscan-SE/) 22 . Complete mt sequences from MTH, MTQ, MTJ, MAL and MAY have been deposited in GenBank with the accession numbers listed in Table 2. Furthermore, complete mt genome sequences from M. terminalis in the Amur River in Russia (MTA) 19 and Parabramis pekinensis strenosoma in the Heilong River (PPH) 23 were used in the sequence correction ( Table 2). The mt genome comparisons between MTH and MTQ, MTJ, MPL, MAL, MAY, MHX, MTA and PPH were performed by BLAST using the CGView Comparison Tool (CCT) 24 .
Specific SNP loci identification. Single nucleotide polymorphism (SNP) loci of the complete mt genomes between MTH and other species from the genus Megalobrama were identified based on comparisons performed with the Clustal W method using the MegAlign program in DNASTAR Lasergene 7.1 software (DNASTAR Inc., Madison, WI, USA). Moreover, a pair of primers (forward: 5′-GCATCTGGCTTCA-ATCTCA-3′; reverse: 5′-GCGTAGGGGATTTGCTGA-3′) was designed to amplify a putative fragment of 380 bp containing the specific locus SNP locus C549T in the D-loop region in MTH based on the 349-728 bp domain of the MTH mt genome in GenBank (Accession No: MH289765). The PCR components were the same as those mentioned above, and the conditions of the reaction were as follows: 94 °C for 5 min, followed by 30 cycles at 94 °C 30 sec, 56 °C for 30 sec, and 72 °C for 30 sec, and an extension at 72 °C for 5 min. After PCR, 2 µL of RE × 10 buffer, 0.2 µL acetylated BSA and 5 U TaqI (Promega Co., USA) was added to each sample (~1 µg), and the reaction was incubated at 65 °C for 2 h. Then, all digested samples of 7 populations (30 per population) were separated on 4% agarose gels with 0.5 µg/mL ethidium bromide. phylogenetic analysis. All (Table 2) were aligned using the Clustal W algorithm 26 implemented in the MEGA 7 program 27 . Substitution saturation was evaluated by DAMBE 6.4.1 28 . Phylogenetic analyses were conducted based on complete nucleotide sequences using maximum likelihood (ML), maximum parsimony (MP) and Bayesian inference (BI) methods. ML and MP calculations were completed using MEGA 7 with 1000 bootstraps 27 . The best model (HKY + G) for the ML analysis was estimated based on the Bayesian information criterion (BIC) in this program. Bayesian phylogenetic analysis was performed using MrBayes 3.2 29 . The best model (GTR + I) was selected based on Akaike information criterion (AIC) in the Modeltest 3.7 program 30 . Two million generations with four chains were run, sampling per 500 generations and discarding the initial 25% of samples as a burn-in.

Results
Characteristics of the MtH mt genome. We determined the complete mt genome of MTH. The assembled mt genome of MTH had a typical circular structure with a length of 16,621 bp, which is similar (16,621-16,623 bp) to that of its closely related species in the genus Megalobrama (Table 3). The whole genome contains 13 protein-coding genes, 22 tRNA genes, 2 rRNA (12SrRNA and 16SrRNA), and a putative control region ( Table 3). The heavy (H) strand encodes 28 genes, while the light (L) strand encodes 9 genes (Fig. 3, Table 3). Based on prediction by tRNAscan-SE, 21 of 22 tRNAs fold into the typical cloverleaf secondary structure, while tRNA Ser(AGN) lacks the dihydrouridine (DHU) stem. ATG is the major start codon of the protein-coding genes, and only one start codon (GTG) is used in the COXI gene. In 13 protein-coding genes, seven (ND1, COX1, ATP8, ATP6, ND4L, ND5, and ND6) have a TAA stop codon, while the other six have incomplete stop codons of TA (COX3 www.nature.com/scientificreports www.nature.com/scientificreports/ and ND4) and T (ND2, COX2, ND3, and CYTB) ( Table 3). The overall GC content of the MTH mt genome is 44.1%, while that of the D-loop region is only 35.9% ( Fig. 3 and Table 4).
Comparison with closely related species of Megalobrama and parabramis. We compared the percent identity and pairwise distances of mitogenomes among MTH, MTQ, MTJ, MAL, MAY, MTA and PPH. The number of the percent identity results were showed in Table 5, and the sequences identity comparison results were reflected in the CCT BLAST map (Fig. 4). High identity was found between MTH and MTQ, MTJ, and MPL (99.6-99.7%). The identity was low between MTH and MHX (96.1%), and the lowest identities were observed between MTH and MTA and between MTH and PPH (95.2% each). However, MTA had high identity with PPH (99.7%). Seventeen specific SNPs were detected between MTH and other fish species in the genus Megalobrama. The numbers of specific SNPs of MTQ and MTJ were 8 and 11, respectively (Table 6). Thirteen MTH-specific SNPs were found in coding regions (ND1, 2; ND2, 1; Cox1, 1; CoxII, 1; CoxIII, 1; ND3, 1; ND4, 2; ND5, 3; and ND6, 1). Only three nonsynonymous SNPs are found, and they were located in ND1 (nucleotide location 4,542 in the MTH assembly and with a codon change from GTC Val to GCC Ala ), COXIII (nucleotide location 10,180 in the MTH assembly and with a codon change from AAG Lys to ACG Thr ) and ND4 (nucleotide location 11,400 in the  www.nature.com/scientificreports www.nature.com/scientificreports/ MTH assembly and with a codon change from GCC Ala to ACC Thr ) ( Table 7). There were four MTH-specific SNPs in noncoding regions (D-loop, 1; 12SrRNA, 1; and 16SrRNA, 2) ( Table 7).
Population-specific marker identification. The SNP C549T in the D-loop region formed one TaqI restriction site (T/CGA or TTGA) in the MTH population with one genotype (180/200 bp, T/CGA) (Fig. 5). In the six other populations, the base at the 549 site was T. However, only in the populations MTQ, MPL and MHX, there was one genotype (380 bp, TTGA) (Fig. 5). In the populations MTJ, MAL and MAY, there was one   www.nature.com/scientificreports www.nature.com/scientificreports/ TaqI restriction site (T/CGA or TCAG) resulting from SNP G573A and SNP A574G, which is different from the position in MTH. In the MTJ population, there were three genotypes (157/223 bp; 157/223/380 bp; 380 bp). In populations MAL and MAY, there were two genotypes (157/223 bp or 380 bp) (Fig. 5). phylogenetic analysis. The substitution saturation of all nucleotide sequences was not tested. The maximum parsimony, maximum likelihood, and Bayesian methods produced identical tree topologies with strong bootstrap and posterior probability values, and the maximum likelihood tree is shown here (Fig. 6) 19 . Based on a comparative analysis, however, we found that the lengths of the complete mt genomes of MTA, MHX, and PPH were identical (16,623 bp), and this sequence of MTA showed higher identity with that of PPH (99.7%) than with that of MTH or other M. terminalis (95.2%). In contrast, the length of the mt genome sequence (MTH) obtained in the present study was identical to those of MTQ, MTJ and MPL (16, 621 bp), and showed a high identity with them (99.6-99.8%). These results suggest that the M. terminalis sequence provided by Imoto et al. was from PPH.   www.nature.com/scientificreports www.nature.com/scientificreports/ M. terminalis and PPH have the same distribution in the Heilongjiang-Amur River basin and share some morphological and biological characteristics 31,32 . Thus, the confusion between the two is very easy to happen. Furthermore, to understand the characteristics of the mt genome of MTH, we detected 17 SNP sites of specific to MTH, which was more than the number obtained for conspecific MTQ (8) and MTJ (11). In China, all fish species in the genus Megalobrama inhabit the southern Yellow River except Faluo, which lives in the Heilongjiang basin 1,32 , where the effective growing season of fish is only ~4 months, and the overwintering period beneath ice reaches ~5 months 33 . Therefore, unique habitats and long-term geographical isolation may have led to MTH having more polymorphic sites in its mt DNA than intraspecific MTQ and MTJ. Three nonsynonymous SNPs in ND1, COX3 and ND4 discovered between MTH and other fish species in the genus Megalobrama remain to be elucidated. Based on the comparative data of specific SNPs, we identified one population-specific marker in the D-loop region (C 549 T), which effectively separated MTH from the six other fish species in the genus Megalobrama in the absence of hybridization. All individuals of MTH used in the present study were homozygous at the specific marker, suggesting that MTH still has a relatively pure gene pool. The genus Megalobrama emerged relatively recently in Cyprinidae fish species 8,34 . M. terminalis, M. amblycephala and M. pellegrini are very similar in morphology before sexual maturity, and can hybridize with each other by artificial fertilization [35][36][37] . Recently, a culture variety of M. amblycephala bred by Li et al. began to appear in the aquaculture waters of northeast China 38 , and posed a potential threat to precious M. terminalis resources. Thus, further work is required to develop population-specific markers at the nuclear gene level for germplasm purity identification of MTH during sample collection and artificial breeding.

Amino acid variation in MTH/others b Gene
The species taxonomy of Faluo has yet to be fully accepted 7 . One of the major reasons is that Faluo shows some distinct morphological characters (number of vertebrae, length of the intestine/body length, body length/ eye diameter, and hypopharyngeal teeth type) from southern M. terminalis 7 . Additionally, Faluo was not used as a representative specimen of M. terminalis in the early classification of the genus Megalobrama 1 . In our present phylogenetic analysis, however, MTH, MTJ and MTQ were in a well-supported monophyletic group. Considering that MTH showed the highest percent identity with MTJ and MTQ, we speculate that the present species classification of Faluo is undisputable.  www.nature.com/scientificreports www.nature.com/scientificreports/ To the best of our knowledge, this is the first phylogenetic analysis including samples from three different geographical populations of M. terminalis with clear origins. MTH in particular represents a population distribution in northernmost China. In previous studies, however, samples were only taken from one population of M. terminalis other than MTH [8][9][10][11][12][13][14] . Notably, MTQ, MTJ and MPL constituted a monophyletic clade, shared a common ancestor with MTH, and further clustered with MAL and MAY in the present study. This result clearly indicates that MPL rather than MTH has a close phylogenetic relationship with MTJ and MTQ. The present results give strong evidence by utilizing complete mt genome data, supporting M. pellegrini as one of the populations of M. terminalis.
The sequence of a complete mt genome from MTH was obtained for the first time. We expect the analysis of specific markers of MTH and identification at the population level will contribute to germplasm purity detection in future sampling and breeding practices. Furthermore, we confirmed that MTH belongs to the species M. terminalis by sequence comparisons and phylogenetic tree reconstruction. The introduction of MTH provided clearer phylogenetic relationships among species in the genus Megalobrama at the mitochondrial genome level, which will also be helpful in the protection and development of this endangered species in the future.