Evolutionary study of maize dwarf mosaic virus using nearly complete genome sequences acquired by next-generation sequencing

Next-generation sequencing is a robust approach to sequence plant virus genomes in a very short amount of time compared to traditional sequencing methods. Maize dwarf mosaic virus (MDMV) is one of the most important plant viruses worldwide and a significant threat to maize production. In this study, we sequenced 19 MDMV isolates (10 from Johnsongrass and 9 from maize) collected in Oklahoma and Missouri during 2017–2019 using Illumina sequencing and determined the genetic diversity. Sequence reads were assembled and 19 nearly complete genome sequences of MDMV isolates were obtained. Phylogenetic analysis based on complete genomes nucleotide and amino acid sequences revealed two main clusters and a close evolutionary relationship among 19 MDMV isolates. Statistical analysis of individual genes for site-specific selection revealed that all genes are under negative selection. The fixation index (FST) analysis of the MDMV isolates revealed no gene flow between the two main phylogenetic clusters, which emphasizes the divergence of MDMV isolates from the USA. Among the USA MDMV isolates, the mean genetic distance (d) and nucleotide diversity ((π) were highest in the P1 gene coding region. This is the first detailed study on the evolutionary relationship of MDMV isolates based on the nearly complete genome analysis from maize and Johnsongrass.

www.nature.com/scientificreports/ MDMV is a single-stranded, positive sense RNA genome (~ 10 kb) virus which is responsible for one of the most important viral diseases in cultivated maize and sorghum 14 and can cause up to 70% yield loss in maize 15,16 . In 2015, MDMV resulted in the loss of 2376, 599 bushels of corn in the US and Ontario, Canada 17 . MDMV can infect more than 200 susceptible grass species including Johnsongrass (Sorghum halepense L. Pers) (abbreviated as JG), which is the natural overwintering host for this virus 14 .
In the USA, MDMV was first reported in 1960. Since then, it has been reported in > 37 states [18][19][20][21][22][23] . In Oklahoma, MDMV was reported for the first time from JG in 2017 24 . Subsequently, we collected a large number of MDMV isolates from both maize and JG and determined the genetic diversity of MDMV isolates based on the coat protein (CP) gene sequences 25 . We also reported the first MDMV complete genome from JG and together there are only seven complete genomes of MDMV available in GenBank, where only one of them is from JG, while all others are from maize 25 . There is no comprehensive study revealing evolutionary relationships of these MDMV isolates due to a lack of additional complete genome sequences. It is important to study the complete genome sequences of MDMV isolates in both maize and JG to understand if the virus is mutating and moving towards being novel strains.
Therefore, in this study, we sequenced 19 nearly complete MDMV genomes from both maize and JG using Illumina sequencing, which contained the full coding region of the virus with varying 5′ and 3′ untranslated regions. Phylogenetic relationships and selection pressure based on coding sequences of the genomes as well as on individual genes sequences of MDMV isolates were analyzed.

Results
Field symptoms of MDMV. The most common field symptoms of MDMV in both hosts (JG and maize) were yellow streaks, chlorosis, and mild to severe mosaic on the leaf blades. Symptoms of a few samples from the maize plants, which were collected in different locations in Oklahoma, are shown in Fig. 1a-e, Symptoms of MDMV on JG have been reported in our previous work 25 . VLPs and virion RNA isolation. Virus-like particles (VLPs) preparations were successfully obtained from all 19 selected dot-immunobinding assay (DIBA) positive MDMV samples. VLPs from a few MDMV isolates were observed by electron microscope and all of them contained typical long flexuous filamentous particles of MDMV ( Fig. 2A). VLPs from all MDMV isolates (Table 1) were run on an SDS-PAGE gel and showed the presence of probable capsid protein for MDMV. VLPs of selected MDMV isolates showing capsid protein of approximately 39 kDa has been shown in Fig. 2B. RNA was successfully extracted from all 19 VLPs. The quality of viral RNA was in the range of 1.9-2.0 at 260/280 ratio and the concentration ranged from 10 to 50 ng/µl. Nearly complete genomes of MDMV isolates. The length of the genomes among 19 MDMV isolates varied from 9204 to 9497 nucleotides (nt) ( Table 1). In all MDMV contigs, the length of ORFs were the same, but differences were noticed in the lengths of the 5′ and 3′ untranslated regions. The variability in the UTRs may not be an accurate representation of these genomes because no 5′ and 3′ RACE-PCR was conducted and is probably the result of differences in the assemblies Intact coding regions of polyprotein for these 19 genomes showed 94-95% identity to the MDMV-Bixby1 isolate (accession no. MK282423). All MDMV sequences obtained in this study were deposited in GenBank with the accession numbers MW026050-MW026068 (Supplementary  Table S1). All USA MDMV isolates contained an additional 39 nt in the N terminal region of the CP gene as described before 25,26 . The length of the polyprotein for all 19 isolates was 3053 amino acids (aa), which is cleaved into ten different protein products including P1, P3, 6K1, 6K2, CI, CP, HC-Pro, NIA Pro, NIa VPg, NIb and PIPO as reported before 27,28 . Phylogroup G2 contained the eight remaining MDMV isolates, including two each from the USA and Hungary, and one each from Italy, Iran, Bulgaria, and Spain (Fig. 3A). The genetic distance (d) value for the entire G1 group including 20 MDMV isolates was 0.027 ± 0.001, while for the G2 group, the d value was 0.142 ± 0.003. Members of the G2 group have diverged more from each other compared to members of the G1 group, which has a lower d value indicating closer relatedness to each other. A large number of coding regions of both phylogroups were under negative selection, while the dN/dS ratios for individual phylogroups were below 1, which also indicates that these isolates are under negative selection. The fixation index (FST) value between G1 and G2 phylogroups was 0.358, indicating no apparent gene flow (Table 2), and was above the threshold of FST > 0.33 value 29,30 .
Within G1, three MDMV isolates KA13, KA15 and KA16 ( Fig. 3A-C) clustered together similar to individual gene trees (Fig. 4A-K). These three isolates were collected from the same Kay County of Oklahoma (Table 1) in two different fields, where KA15 and KA16 were collected from the same field, and KA13 was collected from a different field. MDMV isolates in the G1 group do not indicate clear segregation based on the host or the collection year. Isolates collected from both hosts (maize and JG) in different years clustered together in the G1 phylogroup.
Individual gene phylogenetic trees of 28 MDMV isolates showed ( Fig. 4A-K) a similar pattern to the complete genome tree except for the P1 and CP genes trees. In the P1 gene tree, the MDMV OS1 isolate clustered in G2, while the MDMV isolate from Italy clustered in the G1 phylogroup. Similarly, in the CP gene tree an MDMV isolate from Spain clustered in the G1 phylogroup. Within G1, variation in branching pattern can be observed based on individual gene trees. For example, KA13, KA15, and KA16 MDMV isolates consistently grouped into one sub-cluster within the G1 phylogroup across all individual gene trees ( Fig. 4A-K). There was no clear www.nature.com/scientificreports/ clustering of MDMV isolates based on their host, but this could be on the basis of location as evident from phylogenetic trees (Figs. 3A-C, 4A-K, Supplementary Fig. S1).

Gene by gene selection analysis.
In order to evaluate selection pressure on each coding region for the individual gene as well as the entire coding region of 28 MDMV isolates used in this study, the dN/dS ratios and number of negative and positive selected codons were identified using several analytical methods using  Table 1). The tree was constructed in MEGA6 program using Tamura Nei model with 1000 bootstrap replications. The GenBank accession no. of each isolate is followed by name of the country. www.nature.com/scientificreports/  www.nature.com/scientificreports/ indicated that HC-Pro had the highest nucleotide diversity and genetic distance in maize MDMV population, while for JG population it was the P1 coding region.

Analysis of recombination among MDMV isolates.
A total of 36 recombination events were detected when 28 complete genomes sequences of MDMV were analyzed by RDP4.69 software (Table 5). Among those recombination events, only 10 were found to be significant because they were detected by at least five or more RDP implemented programs with significant recombination events ( Table 5). Most of the recombination events were located in the P1 and HC-Pro genes and some were in the CP and 3'UTR regions involving mostly the USA MDMV isolates.

Discussion
Within the span of 40 years, DNA sequencing has improved dramatically. NGS, is more expensive compared to traditional Sanger sequencing, but efficiently provides a large amount of data. Earlier next generation sequencing methods such as 454 pyrosequencing were used to quickly cover entire viral genomes, which helped dramatically in novel virus discovery, plant virus biodiversity, and evolutionary and ecology-based studies 6 . Currently, many novel plant virus discoveries have successfully and efficiently used Illumina sequencing 7,31 . Therefore, the 19 MDMV genomes sequenced in this study using Illumina sequencing provided sequences in performing evolutionary studies on such a range of MDMV isolates.   Table 4. Population based analysis of genetic parameters for individual coding region of 28 MDMV isolates. a According to Nei 44 . b 20 MDMV isolates obtained from Maize and Johnsongrass (Fig. 1, Table 1). c Eight MDMV isolates used for comparison ( Fig. 1, Table 1). d Eight MDMV isolates obtained from Maize in this study (Fig. 1, Table 1). e 12 MDMV isolates obtained from Johnsongrass (11 in this study and one in previous study) (Fig. 1a, Table 1).

Gene US b isolates
Nucleotide diversity (π) a

All together US isolates
Overall mean Genetic distance (d) www.nature.com/scientificreports/ MDMV is the most common among virus diseases of maize and can infect more than 200 grass species 14,26,32 . Complete genome sequence information is important to understand the virus evolution based on the entire genome as well as individual genes of a virus and their molecular relationships with other available isolates collected from different regions and countries. Therefore, in this work, an additional 19 MDMV isolates were sequenced and determined their relationships with the available MDMV isolates in the GenBank. The complete ORF of 28 MDMV isolates has indicated strong evolutionary relationship between isolates that contained the 39 nt insertion in the N terminal region of the CP gene (G1 phylogroup) and on the origin from where they were collected as reported in our previous study 25 . Within the G1 phylogroup, MDMV isolates from both maize and JG clustered together showing no host preferences as observed previously in the Hungarian MDMV isolates 26 . These results indicate that MDMV isolates may frequently originating from JG and moving towards maize by vector transmission but further experimental evidence is needed in future studies.
In G2 phylogroups, MDMV isolates were polyphyletic but more complete genome sequences of MDMV isolates from these countries are needed to get a better understanding of phylogenetic relationships within the G2 phylogroup. Similar evolutionary relationships were obtained based on the CP gene, where MDMV isolates clustered into two main groups based on the 39 nt insertion in the CP gene 25 .
An individual phylogenetic tree for each gene of the MDMV isolates showed a similar tree topology with the one obtained from complete genome sequences (Fig. 3A-C). Further analysis of individual genes for their genetic diversity (π), mean genetic distance (d), and selection pressure in different MDMV populations (Table 5) indicated the P1 gene is associated with the greatest genetic diversity among US isolates and therefore, is the most phylogenetically informative. All the individual genes were under negative selection, where NIb, HC-Pro and CI have the highest number of negatively selected codons. These genes are associated with viral genome replication and are under strong purifying selection to assure viral genome replication. P1 and CP genes have less negatively selected codons compared to other genes, indicating these genes might allow some variation within the coding region and could be more informative in an evolutionary prospective.
Analysis of a larger number of complete genome sequences has improved the resolution of evolutionary relationships among MDMV isolates, where isolates are grouped into G1 and G2 phylogroups (Fig. 3A,B). The tree topology of the G1 phylogroup did not vary with or without recombinant MDMV isolates obtained in this study (Fig. 3A,B), indicating that the USA MDMV isolates are unique from MDMV isolates reported from other countries. These results were further supported by ML tree obtained from complete aa sequences of MDMV isolates (Fig. 3C). Recombination analysis of all MDMV isolates showed that significant recombination events were observed among USA MDMV isolates (Table 5). This means that recombination could be a major reason for greater diversity observed among USA MDMV isolates. In addition to mutations, recombination could be a major factor contributing to the emergence of a unique MDMV population in the USA, as is evident in Fig. 3A-C where USA isolates cluster separately from the rest of the MDMV isolates reported from other countries.
The use of the CP gene for the generation of MDMV phylogenies, may be appropriate in cases where complete genomes are not available, due to the congruency observed between the CP and complete genome trees from this study. The CP gene of plant viruses has been used in many evolutionary studies in the past because the N terminal region is very informative with high genomic variability 14,26,[33][34][35] , while the core region is more stable, providing enough information in understanding MDMV evolution 25 . In the case of MDMV, the CP gene is evolutionary more informative with the 39 nt insertion as shown in Supplementary Fig. S1, which is a clear evolutionary event where all the isolates that do not have the insertion are ancestral in origin 25 . All the USA MDMV isolates, whether sequenced previously 25 or in this study, were collected from Oklahoma and Missouri and contained the 39 nt insertion in the CP gene, which is known to be a duplication event that happened at one point in the virus evolution 26,35 , and suggests that this insertion stabilized over time. The length of the CP gene can vary due to the presence of this 39 nt insertion (i.e. < 876, 888 and 915 nt; 292, 296 and 305 aa, respectively) 26,35 . For example, MDMV OH-1 and 2 isolates (Table 1) had shorter CP gene sequences lacking 39 nt insertion and both of these isolates were maintained using successive mechanical inoculations 36 . MDMV survives in JG rhizomes during the winter, and in the spring, aphids transmit the virus in a non-persistent manner to maize. MDMV isolates from two states in this study contained 39 nt insertion in the CP gene and caused severe maize dwarf disease in susceptible maize cultivars. We previously proved the aphid transmissibility of the MDMV Bixby1 isolate, which has the 39 nt insertion 25 .
In conclusion, this is the first study that uses a larger number of MDMV isolates from both maize and Johnsongrass and has provided a detailed evolutionary analysis of the nearly complete genome sequences as well as of individual genes of the virus. Our study has shown a better understanding of the molecular diversities and population structures among MDMV isolates as well as the best candidate gene for MDMV phylogeny for future studies. In addition, this work confirms the stability of a 39 nt insertion in the N terminal region of the CP and its importance in MDMV evolution. However, further studies could focus on understanding the function of the 39 nt insertion in the CP gene for better understanding of its contribution in MDMV evolution and severity of maize dwarf disease. The information obtained in this work would be helpful in the management strategies of MDMV in maize, and for future evolutionary studies of MDMV isolates reported worldwide.

Materials and methods
Sample collection. Maize  www.nature.com/scientificreports/ Serological detection. All of the collected maize and JG samples were tested by DIBA as described in our previous study 25,37 . A total of 19 MDMV DIBA positive samples (Table 1) were randomly selected as a representative sample from each of the seven counties in two states based on the number of fields, location and states and used it for further molecular characterization in this study (Table 1).

Virus like particle preparation, SDS-PAGE and virion RNA extraction.
One to two grams tissues from each of the 19 MDMV DIBA positive samples were used for virus like particles (VLPs) preparation as described before 38 . VLPs pellets from each individual MDMV isolate was resuspended in 50-200 µl of nuclease free water. To observe virion particle morphology, few VLPs samples were checked on Transmission Electron Microscope (TEM) according to procedure reported previously 39 . To confirm the presence of capsid protein, an aliquot of 10 µl of each VLPs suspension was mixed with equal volume of 2 × SDS loading dye 40 , run on 12% SDS-PAGE and stained as described before 39 .
After confirming the presence of capsid protein on SDS-PAGE, purified MDMV RNA was extracted from VLPs of each MDMV isolate using RNeasy Plant Mini Kit (Qiagen, MD, USA), and according to the manufacture's protocol, 100 µl of VLPs was mixed with 450 µl of RLT buffer. Viral RNA was eluted in 30 µl of RNase free water. The quality of RNA (using 260/280 ratio) and concentration was determined using NanoDrop8000 Spectrophotometer (Thermo Fischer Scientific, Waltham, MA, USA).
Illumina Nextera XT library preparation. Two step reverse transcription-polymerase chain reaction (RT-PCR) was performed with random hexamers to cover the complete genome of MDMV. The RT step including 10 µl of viral RNA, 4 µl 5× RT buffer, 2 µl of 10 µM dNTPs; 2.0 µl of 20 µM of random-hexamers primer, 1.0 µl of RNase inhibitors and 1.0 µl MMLV reverse transcriptase according to the manufacturer's instructions (Genscript, Piscataway, NJ, USA). RT-PCR was carried out in a 20 µl reaction using 1 U of Taq DNA polymerase (Genscript, Piscataway NJ, USA), 1× Taq buffer, 0.2 mM dNTPs, 5.0 µM random-hexamer primer, and 2 µl of template cDNA. Thermal cycling conditions were 35 cycles of 95 °C for 30 s, annealing 45 °C for 30 s, and extension 72 °C 30 s. The PCR products were analyzed on 1% agarose gel and smears of DNA (ranges from 300 to 1500 bp) were purified using gel extraction kit (Promega, USA).
Purified DNA was quantified using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA). A total of 19 individual DNA libraries for each of the samples were prepared using Nextera XT DNA Library Preparation Kit (Illumina, CA, USA). Individual samples were dual-indexed with unique barcodes for identification purposes of recovered complete genome from each individual sample. Amplified and index adapted libraries were purified using AMpure XP beads and quantified using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA. USA). Individual libraries were pooled and sequenced using pair-end sequencing (2 × 150 bp) using single flow cell on the Illumina MiSeq (San Diego, CA, USA).
Assembly and mapping of Illumina sequencing reads. Raw reads from Illumina sequencing were trimmed and assembled using CLC Genomics Workbench (CLCGW v12.0.3) (Qiagen, MD, US). Both forward and reverse sequences were paired with the following parameters: paired end (forward-reverse) minimal distance 1 to maximum distance 1000, while quality scores were kept at Illumina Pipeline 1.8 and later. Paired reads were assembled using the following parameters: reads were assembled using the de novo assembly function on CLCGW with default graph parameters (automatic word size, and automatic bubble size parameters), and for paired reads, auto-detect paired distances and perform scaffolding functions were selected. Minimum contig's length was set to 200, mismatch cost two, insertion cost three, deletion cost three, length fraction 0.5, and similarity fraction 0.99. Reference-based mapping was then carried out using complete genome sequences available on GenBank for several maize infecting viruses including MDMV, MCMV, SCMV and WSMV (MK282423, www.nature.com/scientificreports/ JX286709, KP860936, NC_001886). Mapping parameters were set as follows: for read alignment, match score was set at one while mismatch cost two with linear gap cost function selected with an insertion cost of three, deletion cost of three, length fraction at 0.5, and similarity fraction 0.9. The consensus contig from the mapping was obtained and inspected for ambiguities, which were corrected with reference to the original assembly or mapping. The open reading frame (https:// www. expasy. org/) and annotation of the final sequences was carried out.
Phylogenetic analysis. Nucleotide (nt) sequences of all 19 MDMV isolates obtained in this study and nine (including our previously MDMV-Bixby1 isolate) 25 complete genome sequences (Table 1) available in the nucleotide sequence repository, GenBank were aligned with the ClustalW.v2 program 41 . Initially, neighbor joining (NJ) trees were developed for both complete genomes and individual genes. Then, based on the best model selection obtained with MEGA6, a Maximum Likelihood (ML) tree was constructed using the Tamura Nei model with 1000 bootstrap replicates for complete open reading frame (ORF) nt sequences. Individual gene trees were also developed using MEGA6 according to the best fitting model for each individual gene. For all phylogenetic trees, Sorghum mosaic virus (SrMV, accession no. U57358) was used as an out-group.

Population genetic parameters.
To understand the selection pressure on each coding region and complete ORF of MDMV isolates, the ratio of nonsynonymous (dN) to synonymous (dS) substitutions was calculated using DnaSP 5.10 42 . Site-specific selection pressure was measured for each codon site using different codon-based maximum-likelihood methods, SLAC, FEL, and MEME for positive selection with p = 0.1 available in Datamonkey 2.0 online version www. datam onkey. org 43 . Nucleotide diversity (π) was calculated as average number of nucleotide differences per site between two sequences 44 .
Neutrality tests. Neutrality tests such as Tajima's D 45 , and Fu and Li's D 46 were also performed in DnaSP5.10 to understand the departure from neutrality for all mutations between group specific MDMV polyprotein sequences.
Genetic differentiation and gene flow. The degree of genetic differentiation and gene flow among different populations of MDMV was measured in DnaSP5.10 using fixation index statistics (FST) as described in Ref. 47 .
Recombination analysis. Recombination analysis were performed by RDPv4.69 48 , including seven different recombination detection algorithms (RDP, GENECONV, Chimaera, MaxChi, Bootscane, SISCAN and 3Seq). The complete genome sequences of all 19 MDMV isolates and 9 other complete genomes from GenBank (Table 1) were aligned and used in the RDP program. The recombination events considered acceptable were those detected by at least five or more recombination detection algorithms and were statistically significant with a Bonferroni-correct P value cutoff of 0.05. Ethics statements. Studies involving human and animals subjects. No animals or human studies are presented in this manuscript. Samples collection. All plant samples collected within the states were according to guidelines provided in an approved standard permit P526P-18-04623 issued by USDA-PPQ.

Data availability
The 19 complete genome sequences of MDMV isolates from the US presented in this study were deposited to NCBI database. The accession numbers from MW026050-MW026068 can be found online https:// www. ncbi. nlm. nih. gov/ genba nk/.