On species delimitation, hybridization and population structure of cassava whitefly in Africa

The Bemisia cassava whitefly complex includes species that cause severe crop damage through vectoring cassava viruses in eastern Africa. Currently, this whitefly complex is divided into species and subgroups (SG) based on very limited molecular markers that do not allow clear definition of species and population structure. Based on 14,358 genome-wide SNPs from 62 Bemisia cassava whitefly individuals belonging to sub-Saharan African species (SSA1, SSA2 and SSA4), and using a well-curated mtCOI gene database, we show clear incongruities in previous taxonomic approaches underpinned by effects from pseudogenes. We show that the SSA4 species is nested within SSA2, and that populations of the SSA1 species comprise well-defined south-eastern (Madagascar, Tanzania) and north-western (Nigeria, Democratic Republic of Congo, Burundi) putative sub-species. Signatures of allopatric incipient speciation, and the presence of a ‘hybrid zone’ separating the two putative sub-species were also detected. These findings provide insights into the evolution and molecular ecology of a highly cryptic hemipteran insect complex in African, and allow the systematic use of genomic data to be incorporated in the development of management strategies for this cassava pest.

The Bemisia cassava whitefly complex includes species that cause severe crop damage through vectoring cassava viruses in eastern Africa. Currently, this whitefly complex is divided into species and subgroups (SG) based on very limited molecular markers that do not allow clear definition of species and population structure. Based on 14,358 genome-wide SNPs from 62 Bemisia cassava whitefly individuals belonging to sub-Saharan African species (SSA1, SSA2 and SSA4), and using a well-curated mtCOI gene database, we show clear incongruities in previous taxonomic approaches underpinned by effects from pseudogenes. We show that the SSA4 species is nested within SSA2, and that populations of the SSA1 species comprise well-defined south-eastern (Madagascar, Tanzania) and north-western (Nigeria, Democratic Republic of Congo, Burundi) putative sub-species. Signatures of allopatric incipient speciation, and the presence of a 'hybrid zone' separating the two putative sub-species were also detected. These findings provide insights into the evolution and molecular ecology of a highly cryptic hemipteran insect complex in African, and allow the systematic use of genomic data to be incorporated in the development of management strategies for this cassava pest.
Confidence in species identification provides insights into the evolutionary forces that drive species diversification, and it underpins the strategies for biological conservation and the management of natural resources. While morphological differences have enabled many species to be identified, cryptic species complexes present a particular challenge due to their general lack of readily distinguishable morphological characters [1][2][3] . Ecological and life history data including host plant use and mating behaviour are needed to mitigate and address these challenges [3][4][5] . This is often time-consuming and impractical especially with large numbers of samples across wide geographic range. Incorporating DNA markers, e.g., diagnostic mitochondrial DNA (mtDNA) markers [6][7][8] , genome-wide sequencing 9,10 , and whole-genome resequencing 11 can increase confidence in differentiating cryptic species in a complex and identifying interspecific hybrids.
The challenges can become critical when individual cryptic species in a complex have a serious impact on plant health or agricultural productivity: insect vectors of plant viral diseases affecting cassava (Manihot esculenta), the most important food crop grown in Africa 12 , is one example of such case. Cassava production is vulnerable to cassava mosaic disease (CMD) caused by a number of cassava mosaic geminiviruses (CMGs) 13,14 and CMD has been reported in Africa since the early 1970s 15 and more recently in South-East Asia 16,17 . While the vector responsible for the spread of the CMD in South-East Asia is not well-understood, the pandemic in east and central Africa was associated with the distribution of sub-Saharan African (SSA) species of the Bemisia whitefly complex 13 . Invasive whitefly species such as the Mediterranean (MED), Middle East-Asia Minor 1 (MEAM1), and SSA2, have become pests of global agriculture, spreading plant viral diseases and becoming resistant to chemical control agents 18,19 . In particular, the endemic African cassava Bemisia whitefly SSA2 species occupies regions affected by the CMD pandemic. SSSA2 has also expanded into southern Europe since 2007 20 whereas SSA1 has not (though widespread throughout the African continent). Defining the actual CMD vectors in Africa remains a challenge due to the cryptic nature of the many species in the Aleyrodidae whitefly genus Bemisia 3 www.nature.com/scientificreports/ Historically, the lack of distinguishable morphological characters among Bemisia tabaci has resulted in geographically distinct species being reclassified as a single species [23][24][25] and underpinned the B. tabaci taxonomic confusion (e.g., review by De Barro et al. 18 ; see also Tay et al. 26,27 ). Currently, over 38 cryptic species are recognised within the B. tabaci (Gennadius) species complex 24 through molecular diagnostics of a partial (i.e., 657 bp) mitochondrial DNA cytochrome oxidase subunit I (mtCOI) 18,28,29 , mating experiments 5,30,31 (although low numbers of F1 hybrids 10,32 including between SSA1 and SSA2 (previously 'Ug1' and 'Ug2') 33,34 are known), behavioural 4 , and host preference 3 studies.
Re-analyses of partial mtCOI sequences relating to the MED species 3 , and of the B. tabaci standard partial mtCOI dataset 33 demonstrated that ca. 60-65% were nuclear mitochondrial DNA sequences (NUMTs) or other PCR artefacts 10,29,35 . These analyses, together with molecular characterisation of original voucher specimens, have begun to disentangle the species complex 26,27 , identified historical species ranges 26,27,36 , and provided phylogenetic evidence that the 'B. tabaci cryptic species complex' is a polyphyletic group derived from various SSA Bemisia whitefly species' 29 . Delimitation of populations across the full African Bemisia species habitat range will be needed. This would help to ascertain whether any of these species, e.g., SSA1, is genetically distinct with limited gene flow from others either due to external (e.g., geographic or climatic limitation) or internal (e.g., presence/ absence of secondary symbionts) factors including host association.. Recognition of such taxonomic classifications and their distribution in sub-Saharan Africa will be important for management of these pest species that could pose a threat to plant health and global agricultural biosecurity 37 .
The advent of high-throughput sequencing (HTS) technologies has driven the availability of genome-wide data that is further transforming the Bemisia taxonomy. HTS has demonstrated that various species previously recognised via characterisation of mtCOI (e.g., SSA4 29 ; Middle East Asia Minor 2 (MEAM2) 10,29,35 ) were likely artefacts caused by the sub-optimal primers 16,17 and sequencing of NUMTs 10,35 . Analyses of mitochondrial genomes also demonstrated inconsistencies of B. tabaci cryptic species identification based on the standard partial mtCOI gene region, and cautioned the need to combine behavioural and molecular evidence to delimitate species status 3,31 .
Despite the recent advances and their importance in the African agricultural landscape, the taxonomy, population structure, and evolutionary genetics of the African cassava Bemisia whitefly species complex (i.e., SSA1, SSA2, SSA3, SSA6, and SSA9) 29 remain poorly understood. A recent study of taxonomic structure and gene flow among genetically diverse SSA1, SSA2, SSA3, and SSA4 species 38 described serious discrepancies between their results obtained from mtCOI and from nextRAD sequencing. This (the conflicting results of mtCOI and nextRAD sequencing) findings has strongly implies that the species delimitation of sub-Sahara African Bemisia species needs to be revisited since it is currently based on mtCOI. We seek to show why mtCO and nuclear DNA should result in such conflicting implications for the African cassava Bemisia whitefly SSA1 (which is also further refined into subgroups (SG) 1, 2, and 3 in, e.g., Legg et al. 13 ), SSA2, and SSA4 species. With the cassava Bemisia whitefly being the most common whitefly on cassava crops in Africa, and capable of vectoring diseases (e.g., cassava mosaic begamoviruses, CMBs; cassava brown streak ipomoviruses, CBSIs) known to cause significant crop losses 39 , our findings will also provide insights into incipient speciation of this agriculturally important Bemisia species complex with the capacity to impact on food security in sub-Saharan Africa 40,41 .

Results
Wosula et al. 38  Integrity of African cassava Bemisia SSA mtCOI sequences. Berry et al. 42 reported 11 'Sub-Saharan III: Western Africa-Cameroon/Cassava' sequences which were renamed as SSA4 18 . Of these 11 SSA4 sequences, only two short sequences (i.e., Cam Ayos 1 WO2, Cam Ayos 2 WO3) did not have INDELs or exhibit any premature stop codons at the amino acid level (Suppl. Fig. 1). The remaining nine sequences had random INDELs that resulted in frame-shift mutations and premature stop codons, which suggested that they are likely to be NUMTs. The Cam Ayos 2 WO3 sequence also had nucleotide substitutions that resulted in amino acid residue changes in highly conserved mtCOI regions, suggesting that this was likely also a NUMT sequence. The remaining single, short but 'functional' Cam Ayos 1 WO2 sequence (AF344246) that phylogenetically grouped 42 with the 10 NUMT-related SSA4 sequences suggested that as a whole, these SSA4 sequences originally reported by Berry et al. 42 were all likely NUMTs. The fact that SSA4 sequences were phylog enetically clustered to SSA2 suggested they were also potentially of nuclear origin, similar to that reported for B. tabaci MEAM2 NUMT 10,35 that originated from the nuclear genome of MEAM1 10,35 .
Further evidence that the partial mtCOI gene lacks the power to differentiate between evolutionary very closely related cryptic species (e.g., the 'subgroups' (SG) 13,43 taxonomic status within SSA1) can be seen by examining nucleotide substitution patterns. For example, re-assessment of the two SSA1-SG5 sequences (MF417585, MF417586) showed that they differed from the SSA1-SG3 sequence JQ286482 by only three C/T substitutions that resulted in 0.56% nucleotide difference (Suppl. Fig. 2A). The basis for suggesting SSA1-SG5 and SSA1-SG3 to be different 'subgroups' within the SSA1 species was therefore unclear. For example, sequences with similar (Suppl. Fig. 2B) or with greater (Suppl. Fig. 2C) numbers of nucleotide substitutions were placed within same subgroups, or potentially as the same species 31 (Suppl. Fig. 2D).
Admixture analysis identified three genetic clusters (Fig. 3) that followed the partial mtCOI gene-based species diagnosis (see also Suppl. Fig. 3), whereby the SSA2 species was clearly shown to be different from the SSA1 species. The SSA1 species was further shown to consist of two distinct genetic backgrounds corresponding to SSA1-NW and SSA1-SE (containing 'blue' SSA-WA and SSA-ECA, and 'red' SSA-CA and SSA-ESA, respectively, as defined by Wosula et al. 38 ). Some SSA2 and all the SSA1-CA samples showed evidence for low levels of admixture.
Interestingly, F st analysis (Table 1) supported SSA1-NW and SSA1-SE as potential putative sub-species of SSA1, with intermediate F st estimates observed for all pair-wise comparisons at the inter-specific (i.e., SSA2 vs. SSA1; F st = 0.286-0.334) level. This was particularly evident for SSA1-NW versus the SSA1-SE 'ESA' population, with F st of 0.226-0.243. F st differentiation was less clearly defined for the SSA-CA population, whose F st scores versus SSA1-NW were closer to those between populations within the two major SSA1 groups. The overall differentiation between SSA1-NW and SSA1-SE suggested likely existence of putative sub-species within SSA1, the observations concerning the SSA-CA population suggest gene flow between neighbouring populations. Treatment of SSA1 as single species against the SSA2/SSA4 species showed significant population substructure ( Within the putative SSA1-NW sub-species, the SSA-WA population is predominantly found in the lowlands (ca. 300-1500 m), whereas the SSA-ECA/SSA-CA populations were found at higher altitudes (from ca. 600-≥ 1500 m). Individuals belonging to the SSA-ESA group were predominantly from coastal landscape of Tanzania and Madagascar (Fig. 4). A gene flow contact zone was detected at regions represented by five DR Congo individuals and one Tanzanian individual (i.e., the SSA-CA population). Inter-specific hybridization www.nature.com/scientificreports/ was detected in two SSA2 individuals (Fig. 3, green/blue genetic backgrounds) and in one DR Congo individual from SSA-CA (red/green/blue in Fig. 3). The Treemix ML tree supported the SSA4 species as part of SSA2, and the migration edges (m = 2) revealed gene flow in both directions, between SSA1 and SSA2 species from central Africa (i.e., SSA-CA), and supports detection of hybrid individuals identified in the Admixture analysis (Fig. 3). The weight of the migration edge going from SSA-CA to SSA2 is also higher than the one going the opposite direction (Fig. 5). TreeMix ML tree also showed a lack of branch depth between each cluster of SSA2/SSA4, ECA/WA, and CA/ESA, thereby provided further support that entities within the three clusters were identical.

Discussion
We combined genome-wide SNPs and mtCOI markers to gain evolutionary, ecological and population genomic insights into the sub-Saharan African cassava Bemisia whitefly species complex. By taking into consideration the impact of NUMTs, we showed no conflict in the same nuclear and mtCOI dataset in defining Bemisia species status. A thorough filtering and reanalysis of genome-wide SNP markers also detected signatures of incipient speciation and admixture in the Bemisia SSA1 species. While various evolutionary genetic studies based on the partial mtCOI gene have proposed the presence of SSA1 species subgroups 13,46 , we caution against the use of the mitochondrial 'subgroup' classification based on genome-wide SNP markers. Mating experiments were inconclusive in separating subgroups of SSA1 especially between SG1 and SG2 31 , while for the first time, genome-wide SNP loci showed that the SSA1 species could potentially be defined as to consist of a north-western (NW) and a south-eastern (SE) putative sub-species, with a contact zone separated by the eastern highland and central/ western lowland regions.
Wosula et al. 38 adopted the nextRAD genome-wide SNP approach first demonstrated by Elfekih et al. 10,[47][48][49] for Bemisia species to investigate gene flow patterns and diversity in the SSA1, SSA2, and SSA4 African cassava Bemisia whitefly species complex, but overlooked the impact of NUMTs 10,32,35 in published B. tabaci phylogenetic studies based on partial mtCOI sequences. This led to the conclusion that the mtCOI gene lacked efficacies at differentiating African Bemisia cryptic species. Increasingly, reanalyses of the B. tabaci cryptic species complex partial mtCOI gene dataset have detected the presence of significant NUMTs and pseudogenes 3,29,35 . These studies highlighted on-going challenges associated with species delimitation in the cryptic B. tabaci species complex. Similar to MEAM2 28,50 , our analysis supported that the SSA4 42 was likely NUMT artefacts as previously proposed 29 . The misidentification of SSA4 underpinned all subsequent genomic interpretations of gene flow and evolutionary ecology in the cassava SSA Bemisia whitefly species 38 .
Significant genetic diversity in the B. tabaci species complex can obscure species/putative sub-species identification. For instance, high genetic diversity in the MED species based on nextRAD SNP data has been reported 10 ; however, the MED-ASL as a separate species was only further supported by including also complete mitogenome, mating experiments 3 , and host preference studies 5 . Genomics and mating experiments similar to the MED-ASL studies 3,5 showed that the SSA1-SG1 and SG2 were likely the same species, with SG3 potentially being different 31 (i.e., the 'WA' clade; Fig. 2B,C). Similar studies are needed to provide additional biological support for various SSA Bemisia species subgroups 51 including the putative SSA1-NW and SSA1-SE sub-species.
In developing the DNA barcoding approach, Hebert et al. 52 showed that the COI gene is overall effective for species level delimitation due to the associated nucleotide substitutional rates for this gene (but see Hanemaaijer et al. 53 ), while for fine-scale identification such as within species level, the use of genes with greater mutational rates were recommended. Partial mtCOI gene therefore lacks the power to differentiate between evolutionary very closely related cryptic species status, for which a genomic approach should be considered 3,31 . Subgroup classification within the SSA1 African cassava Bemisia whitefly species based on the mtDNA gene might be possible, provided that a gene region with an appropriate substitutional rate is identified. Nevertheless, there is currently no evidence for maternal lineage-based patterns to population dispersal behaviour, host adaptation, and fitness Branch colour schemes (red, green, blue) are as used in Admixture analysis (see Fig. 3). Sample names and branch tip colour schemes (filled circles) followed Wosula et al. 38 , and where available, mtCOI GenBank accession numbers corresponding to selected samples, and their proposed mtCOI SSA1-subgroups (SG) 13 status (i.e., SG1, SG2, SG3, SG5 44 ) are also listed. Individuals that have partial mtCOI gene sequence but have been excluded from the nextRAD genome-wide SNPs phylogeny analysis due to poor SNP data coverage are boxed. The Bemisia SSA2-SSA4 clade is indicated by green branches, and the SSA lineages as proposed by Wosula et al. 38       www.nature.com/scientificreports/ cost differentiation in the B. tabaci complex and the African cassava Bemisia species complex, and therefore the subgroup delimitation based on the partial mtCOI gene is not justified. The SSA1-SG1, representing the major SSA1 mtCOI subgroup, was previously implicated in the CMD pandemic in East Africa 13 . Wosula et al. 38 reclassified SSA1-SG1 individuals (e.g., SSA1-SG3, SSA1-SG5 44 ) into three SNP-based groups of SSA-ECA, SSA-CA, and SSA-WA, due to discrepancies between the 657 bp mtCOI partial gene and nuclear SNP markers. Based on the same nuclear and mitochondrial dataset, we showed that while separate phylogenetic signals were detected for the 'WA' and 'ECA' populations, a strong genetic difference between the 'CA' and 'ESA' populations was lacking, and we therefore advocate following established good practice 51 to refrain prematurely adopting other nomenclature (i.e., those proposed by Wosula et al. 38 ).
Although refined higher-level species status could help resolve virus transmission capacity and provide insights into understanding pest introduction pathways 11,54 , there is currently insufficient evidence in African cassava Bemisia whiteflies to link either the WA/ECA or CA/ESA populations to historical CMD outbreaks. The B. tabaci SSA2 (i.e., 'Ug2' 33 ) was assumed to be associated with the 1990s CMD epidemic, mainly due to its geographic distribution in relation to the front of the epidemic 55 . However, transmission underpinning outbreaks of CMD in the sub-Saharan Africa region is complex, with one factor being the sharing and propagation of diseased cassava cuttings between farmers. There was a recombination event between two geminiviruses that produced a more virulent hybrid 56 , however we have little historical information about how the viruses interacted with the different Bemisia cryptic species in terms of disease dynamics 57 . Evidence for bidirectional interspecific hybridization detected between SSA1 and SSA2 in this study, therefore may have implications to understanding the evolutionary genetics associated with plant virus transmission capacity by the cassava SSA Bemisia whitefly species complex. The role of the SSA2 species in the CMD epidemic nevertheless remained unclear, with the SSA2's habitat range (east and west Africa) overlapping the SSA1 species 22,38,55,[58][59][60][61][62] . The exact relationship between SSA1/SSA2 and outbreaks of CMD 63 should be re-visited, taking into consideration current genomic evidence for the presence of SSA1/SSA2 hybrids. Furthermore, vector competencies of other common cassava whitefly species (e.g., B. afer) to different cassava viruses also need further investigation, an ideally include a genomic component. www.nature.com/scientificreports/ B. tabaci sensu lato on cassava in sub-Saharan Africa is a species complex going through a continuous diversification process. The putative SSA1-SE sub-species that the Madagascan (SSA-ESA) and Tanzanian (SSA-CA) populations belong to are geographically isolated from the Nigerian (SSA-WA), DR Congo, and Burundi (SSA-ECA) populations with a hybrid zone between Tanzania, Uganda and DR Congo. The East African Rift Valley is a major geographical barrier to gene flow and provides the environment to promote incipient speciation and to enable recent radiations to thrive and evolve 64 , and could underpin incipient allopatric speciation in the SSA1. The "rift effect" acting as a gene flow barrier has been described in other insects (e.g., the maize stalk borer Busseola fusca 65 , the malaria vector mosquitoes Anopheles gambiae 66,67 ) and vertebrates (e.g., birds 68 , frogs 69 ). Specifically, the western branch of the Rift Valley, i.e., the Albertine Rift covering part of Tanzania, Burundi, Rwanda, Uganda and DR Congo, is a mountain chain situated at 1500-3500 m above sea level (Fig. 4) formed by volcanic activity since the plio-pleistocene. In addition to volcanic activities, climate change during this glacial period led to habitat fragmentation that promoted species diversification in the newly formed biogeographical zones 70 . Species diversification in the African Bemisia whitefly cryptic species system has also been explained by host plant associations and their ability to feed on diverse host plants and acquiring new ones 71 . Using RNAseq, Malka et al. 72 provided insights into how the variation in detoxification gene expression levels might have conferred plasticity that likely also contributed to species diversification.
Finally, we note that our genome-wide SNP phylogenetic analysis showed weak support (bootstrap value 34%, data not shown) for closer evolutionary relationship between SSA2 and SSA1-NW, than between the two putative SSA1 sub-species. This closer phylogenetic relationship between SSA2 and SSA1-NW is likely to be unreliable given the weak bootstrap support, greater gene flow patterns between the SSA1 sub-species (Table 1), and the congruence of species status based on mtCOI molecular diagnostics. our study therefore further highlights the need for careful selection of outgroups that takes into consideration the evolutionary biological questions to be addressed (e.g., whether the target species were more closely related (i.e., between SSA1, SSA2, SSA3, SSA6, SSA9) 29 or evolutionary diverged (i.e., between SSA1 and AsiaII_1) 29 ). Data obtained for this population genomic reanalysis involved only limited number of target and outgroups individuals and species that represented significant evolutionary divergent lineages from the SSA1/SSA2 species 38 . The challenge of obtaining sufficient sequencing coverage that is necessary for filtering to obtain a robust set of genome-wide SNP markers for the Bemisia cryptic species complex 10,47-49 known to have a large and complex genome organization 73,74 will likely further enhance and perpetuate their on-going status as one of the world's most damaging and challenging group of agricultural insect pest complex.

Material and methods
mtCOI data handling. The mtCOI sequences of Wosula et al. 38 were downloaded from GenBank (MF417578-MF417602), including all published reference sequences used to identify the SSA1 'genetic groups' . These partial mtCOI sequences were imported into Geneious R11.1 and aligned with reference sequences (Suppl. nextRAD data processing. The raw data from nextRAD sequencing 38 was accessed from GenBank (SRP103541). The fastq sequences were screened using FastQC v.0.11.4 for quality control. The reads were trimmed by quality and reads longer than 151 bp was trimmed to this length using Trimmomatic 76 version 0.36.
Mapping quality. We mapped the reads from each individual sample to the B. tabaci MEAM1 genome 73 using Burrows-Wheeler Aligner (BWA) v.0.7.12 77 . The alignments used the BWA-MEM algorithm with default settings. The SAM files were converted to BAM output using Samtools 78 v. 1.3.1. The BAM files were subsequently sorted, indexed and checked for quality and mapping percentages per scaffold. SNP genotyping. SNPs were first called using a de novo approach in PyRAD v. 0.9 79 . PyRAD is a pipeline designed for RADseq datasets that aims to capture variation at the species/clade level. It allows clustering of highly divergent sequences and takes into consideration indel variation. We also performed SNP calling in PyRAD, using reads mapped to the MEAM1 reference genome 73 .

Genetic clusters and tests for introgression.
Genetic clusters within the dataset that consists of 62 SSA samples were investigated by Principal Component Analysis (PCA) using the SNPRelate R package 80 . We used ADMIXTURE v.1.3.0 81 on the 62-sample dataset to identify genetic clusters and estimate the genetic ancestry of the SSA samples. We ran the program with K ancestral clusters varying from 1 to 10 with each 'K' value repeated 100 times. A cross-validation test was performed to determine the optimal K value.
Phylogenetic analyses. The topology of the phylogenetic relationships between individuals/populations/ sub-species and species within the dataset (62 SSA samples and three outgroup individuals (TZ_CHA1, TZ_ CHA2, TZ_KIL1)) was examined using a maximum likelihood (ML) approach. The SSA species status were first identified using the mtCOI marker, then, a phylogeny was reconstructed using the nextRAD genome-wide SNPs excluding samples with low genotype quality (mapping quality < 10) to minimize biases that could potentially be introduced by missing data. We performed five replicates of phylogenetic reconstruction in RAxML v.7.2.8 82 using the GTR substitution model and GTRGAMMA as the GAMMA model of rate heterogeneity, with 1000 bootstrap replications. TreeMix 83 v. 1.12 was used to identify genetic mixing, the history of population splits Scientific Reports | (2021) 11:7923 | https://doi.org/10.1038/s41598-021-87107-z www.nature.com/scientificreports/ and admixture patterns. We ran simulations in order to infer the best-fit scenario for the genetic relationships between these cassava Bemisia whitefly populations involving up to 5 migrations events with 100 bootstrap replications (e.g., based on previous studies using this type of analysis 49,84,85 . The likelihoods were compared using the following command in R: tail -n 1 Data/data.frq.strat.tree.*.*.llik | cut -d \: -f 2 > Data/data. The decision to select the number of migration m = 2 is based on the residual fit matrix generated by the TreeMix program. For each migration (m = 0, m = 1…, m = 5), we generate a residual plot. When this plot is saturated at a given m, the best fit scenario and the best-fit ML tree that corresponds to the specific m number are obtained. If it is m = 2, then any number bigger than 2 would not add any more information. Programming scripts used in this study are available as Supplemental Data 1-7.