Genetic variation architecture of mitochondrial genome reveals the differentiation in Korean landrace and weedy rice

Mitochondrial genome variations have been detected despite the overall conservation of this gene content, which has been valuable for plant population genetics and evolutionary studies. Here, we describe mitochondrial variation architecture and our performance of a phylogenetic dissection of Korean landrace and weedy rice. A total of 4,717 variations across the mitochondrial genome were identified adjunct with 10 wild rice. Genetic diversity assessment revealed that wild rice has higher nucleotide diversity than landrace and/or weedy, and landrace rice has higher diversity than weedy rice. Genetic distance was suggestive of a high level of breeding between landrace and weedy rice, and the landrace showing a closer association with wild rice than weedy rice. Population structure and principal component analyses showed no obvious difference in the genetic backgrounds of landrace and weedy rice in mitochondrial genome level. Phylogenetic, population split, and haplotype network evaluations were suggestive of independent origins of the indica and japonica varieties. The origin of weedy rice is supposed to be more likely from cultivated rice rather than from wild rice in mitochondrial genome level.


Results
Mt variations across the genome. A number of variations in the rice mt genome were observed in the current study. In total, 4,717 variations, including 4,507 SNPs and 210 indels, were identified across the mt genomes of landrace and weedy rice accessions, along with wild rice. However, after removing missing values and heterozygotes, only 264 high-quality (HQ) variations remained (Table 1). Excluding wild rice, we identified 3,960 variations across the landrace and weedy rice accessions, with 203 HQ sites (Supplementary Tables 1 and 2). In total, 3041 variations (76.8% of total) among the 60 rice accessions located in the intergenic regions, while only 23.2% variations are in the gene region (Supplementary Table 1). When we assigned these variations to different  Table 1. Summary and subgroup distribution of the total variations (SNPs and Indels) detected in 60 Korean origin landrace and weedy rice along with 10 wild rice using the mitochondrial genome of Oryza sativa japonica as reference. a HQ variation: High Quality variations number. Here, it refers to the variations without any missing data and heterozygotes. b The landrace and weedy types were investigated using the program PowerCore. c As defined by the program ADMIXTURE using ~1.6 M nuclear genome SNPs. indica (IND type), japonica (JAP type). d Wild rice were from the Xu et al. (Xu et al.) 44 .
groups, excluding the total SNPs, no significant difference was observed in the distribution of variations between landrace and weedy rice ( Table 1). The overall distribution of variations in the total collection and different groups was also targeted based on the reference genome ( Fig. 1), which suggested that the variations occurred with a region-dependent preferential. Few variations were observed in several large regions of the genome, some of which contained no variation (Supplementary Table 3).
Variation architecture at the mt genome level. The nucleotide diversity (pi) of the mt genome in the whole collection and different subgroups (landrace, weedy and wild rice, and indica and japonica) was calculated (Fig. 2). In the whole collection, pi ranged from 0.0176 to 8.23E-06 with 1 kb slide window among whole mt genome. Most of the pi values were lower than 0.005 (Fig. 2a, Supplementary Table 4), while some regions with extremely high diversity were also identified. In most regions of the mt genome, landrace rice showed slightly greater diversity than weedy rice (Fig. 2b Table 7). The average pi of the indica and japonica types was greater in landrace rice than in weedy rice (Fig. 2e, Supplementary Table 8). These results were also suggestive of greater diversity in indica than in japonica. The genetic distance (Fst) among the different groups was also calculated, which revealed a very high level of breeding (low Fst value) between landrace and weedy rice, and a closer association between landrace rice and wild rice (0.1109) than weedy rice and wild rice (0.1228) (Fig. 2f). When we isolated the wild rice into O. rufipogon and O. nivara, and the landrace and weedy rice into indica and japonica (Fig. 2g), the weedy_indica was more distant with O. nivara than landrace_indica, indicating that weedy_indica had much lower level of breeding with O. nivara than the landrace_indica. Similar results were observed among the O. rufipogon and japonica types (landrace and weedy), illustrating that the landrace_japonica was much closer with O. rufipogon than weedy_japonica. However, the breeding level between weedy_japonica and landrace_japonica was much higher than between weedy_indica and landrace_indica. In addition, overall Fst values between O. rufipogon and japonica type were lower than those between O. nivara and the indica type in landrace and weedy rice.
Admixed population structure of landrace and weedy rice. A neighbor-joined phylogenetic tree was constructed in PHYLIP with 1,000 replicates based on the mt genome SNPs, and the consensus tree is shown in Fig. 3a. As illustrated, the japonica, indica, and wild types fell into three subpopulations (POP1, POP2, and POP3, respectively). However, the landrace and weedy indica types were mixed, as were the landrace and weedy japonica types. This admixture was also observed in the population structure estimation with increasing K (number of populations) values from 2 to 6 ( Fig. 3b). When the K value increased to 6, the clear separation of indica and japonica types was preserved; however, no obvious clustering of landrace_indica and weedy_indica be detected in indica group, similar to landrace_japonica and weedy_japonica in japonica group. PCAs (principal component analysis) of the population with and without wild rice were also performed to compare the grouping of the different groups, which revealed that the indica, japonica, and wild types could be grouped (Fig. 3c). However, the landrace or weedy indica and japonica types were mixed together (Fig. 3d), which was consistent with the results from the neighbor-join phylogenetic and population structure analyses.
Phylogenetic relationships in landrace and weedy rice. Phylogenetic analysis of the whole collection was performed using a Bayesian MCMC search with MrBayes 3.2.5 software using the best-fit model K80 (Fig. 4a). In parallel, a ML iterative model-based method (with best-fit model SYM) with a bootstrap of 1,000 replicates to assess the reliability of the phylogeny reconstructed using PhyML was also conducted (Fig. 4b). A tanglegram was then constructed for the two trees generated using the two methods, which showed complete consistency with three major clusters: indica, japonica, and wild types. Although the tree topology structure generated by the two methods differed, the phylogenetic relationships of these accessions were similar. Wild rice was located in the middle with indica and japonica in the two opposite sides, suggesting that indica and japonica may have different origins. In addition, we found no clear separation of landrace and weedy rice within the indica and japonica types, which is indicative of their sophisticated genetic backgrounds.
Haplotype network and population splits based on the mt genome. Given the observed complex genetic relationship of landrace and weedy rice, and to support our results, we conducted a haplotype network analysis and population split test in the current population. The samples were divided into six groups: O. rufipogon, O. nivara, landrace_japonica, landrace_indica, weedy_japonica, and weedy_indica. The haplotype network revealed 11 haplotypes in the whole collection, dominated by two common haplotypes that were detected in the majority of landrace and weedy rice (Fig. 5a). Excluding the three wild types with haplotypes similar to the indica type, other wild rice accessions, including two indica suspected accessions, shared the other minor haplotypes. Generally, haplotype 1 was represented primarily by the indica type, whereas haplotype 2 was detected primarily in the japonica type. Analysis of population splits among the six groups showed that the indica and japonica types were separated by wild rice, which also indicating the different origin of indica and japonica (Fig. 5b).

Discussion and Conclusions
In this report, we mined mt variation in Korean landrace and weedy rice accessions together with 10 wild rice varieties (O. nivara and O. rufipogon). Generally, different mt genome variations were detected among different groups and the wild rice is more polymorphic than indica and japonica (Table 1). Landrace rice has more Scientific RepoRts | 7:43327 | DOI: 10.1038/srep43327  W_073  W_126  W_124  W_121  W_123  L_022  W_122  W_013  ruf_105958  W_125  L_021  L_030  W_010  ruf_P46  niv_89215  ruf_Yuan  niv_106154  niv_106105 niv_80470 Figure 4. A tanglegram phylogenetic analysis using trees from ML and BI methods to compare the differences from two methods. (a) Phylogram and radial tree layout of the ML tree. (b) BI-based tree using the same datasets. Best-fit models were evaluated using jModeltest. The tanglegram was implemented in Dendroscope using a Neighbor Net-based heuristic method, which use line connects the same accession in two trees to see the difference phylogenetic structure. indica, japonica, wild and mix groups were marked with different colours. The accessions named with "L" or "W" indicate the landrace and weedy rice. variations than weedy rice, indicates that landrace rice is more diverse than weedy rice. We also carried out an intersection of different groups, which revealed the wild, indica, and japonica types shared a huge number of variations (1,068, 22.6% of the total; Supplementary Fig. 1). Similar distributions could be observed in two other groups, with 1,543 variations (32.7% of the total) shared among wild, landrace, and weedy rice ( Supplementary Fig. 1); and 1,317 variations (27.9% of the total) shared in indica or japonica varieties of landrace or weedy rice ( Supplementary Fig. 1). Especially among indica and japonica types, specific variations were more common in landrace than in weedy rice. These results also support that the landrace rice is more diverse than weedy rice. Evaluation of diversity and genetic distance of landrace and weedy rice revealed that weedy rice has slightly less nucleotide diversity than landrace rice, and that the landrace indica and japonica types have greater nucleotide diversity than weedy rice (Fig. 2). It is same with our previous result by using nuclear genome of Korea landrace and weedy rice 35 . We found greater genetic distance from wild rice in Korean weedy rice than in landrace rice, with less between landrace and weedy types. The distance between landrace_indica and weedy_indica types was much greater than that between landrace_japonica and weedy_japonica types, which suggested that japonica is less diverse than indica. In addition, genetic distance evaluation suggested that weedy rice was far from wild rice but close to landrace rice, indicated the low breeding level between weedy rice and wild rice.
Population structure analysis suggested that landrace and weedy rice in Korea have mixed genetic backgrounds and cannot be separated in mt genome level (Fig. 3). Not only weedy but landrace rice has admixed population composition, as some varieties covered two or three subpopulation components (Fig. 3b). However, precious little wild rice background was shared in weedy rice accessions, foreboding the distant background of them. PCA of the whole collection indicated that the indica and japonica types showed admixture in landrace and weedy rice accessions (Fig. 3c,d). It supposed to be that the Korean weedy rice may not from the hybridization between wild and cultivated rice. In common, cultivated rice is also hardly to be hybridized with wild rice (O. rufipogon or O. nivara) in Korea, since there's no wild rice in Korean farmers' field in most cases. Another interesting event we found in the population structure was that most landrace or weedy rice only shared the background within landrace_japonica or landrace_indica, which means the indica or japonica of weedy rice may from independent cultivated rice (indica or japonica, respectively). We then conclude that Korean weedy rice not from the wild rice but from the cultivated rice itself.
Phylogenetic analyses using two different methods with different nucleotide substitution models revealed a same phylogenetic structure of all accessions, although the overall tree topology structure was very different (Fig. 4). Three groups (wild rice, indica, and japonica types) were well illustrated and separated. However, consistent with previous results, landrace and weedy rice could not be clearly seperated both in indica and japonica subgroups. This supported the complicated genetic background of landrace and weedy rice in mt genome level. As we know, the genome transfer is easily happened between mt genome and nuclear genome 14,36,37 . Genetic differentiation of nuclear and mt genomes in indica, japonica and wild rice exists and may infers the different transfer patterns 36,38 . Though we cannot totally avoid all interference from the transfer between mt and nuclear genome, we tried to minimize the bias by removing the most easily transferred region in our study. A set of 965 SNPs, which located at mt highly possible non-transfer region (these regions were identified from the Rice Genome Annotation Project, http://rice.plantbiology.msu.edu/annotation_pseudo_organellar.shtml) were isolated and applied for further evaluation. In addition, the same reference information for the SNPs mining and non-transfer regions characterization were employed, which would also reduce the bias. Population structure and phylogenetic analysis revealed that the results were similar by using all dataset and non-transfer dataset (Figs 3 and 4, Supplementary Fig. 2). Three subpopulations, indica, japonica and wild rice can be clustered into the same pattern between the two datasets (K = 5 with non-transfer SNPs and K = 6 with whole SNPs). The similar result by using whole SNP set and non-transfer SNPs indicated that the transfer pattern has limited impact in current study. This also suggested that the variations based method for phylogenetic or evolution studies is feasible, and using all variations in mt genome is also suit for such analysis with high decent accuracy.
Furthermore, we generated a haplotype network of the whole collection, which was dominated by two common haplotypes, including primarily the indica type (haplotype 1) and the japonica type (haplotype 2), respectively. Haplotype 1 harbored 15 accessions, and haplotype 2 covered 46 japonica accessions (Fig. 5). Haplotypes of landrace and weedy rice were contained within the two major haplotypes, and no additional minor haplotype was found in landrace or weedy rice. These results suggest that Korean landrace and weedy rice do not have unique background with each other at the mt genome level, which hold only the haplotypes distributed between indica and japonica.
Current report suggests that mt genome-based analyses can be applied in genetic diversity studies, as well as in population genetics and phylogenetic analyses. Outcomes from the rice mt genomes reveal and support the independent origin of O. sativa L. and also suggest that Korean weedy rice are more likely to be originated from cultivated rice rather than wild rice. Korean landrace and weedy rice have complicated genetic background and different genetic architecture in mt genome. According to a cytoplasmic-genetic male sterility genes study, the weedy rice in different regions most likely originated from local cultivated rice (indica or japonica) and the hybrid rice probably has been involved in the evolution of some weedy rice accessions 39 . These results are also consistent with the outcomes from our report. A lack of relationship between weedy and wild rice was characterized in areas where O. rufipogon is still present 40 , which also supports the conclusion from current report. It indicates that using mitochondrial genome or CMS (cytoplasmic male sterility) genes are a good method for replenishing the evidence from nuclear genome. Apart from nuclear genome-based genomic and evolution studies, we believe this study of the mt genome will increase our understanding of the genomics and evolution of Korea rice.
Scientific RepoRts | 7:43327 | DOI: 10.1038/srep43327 Methods Samples and whole-genome resequencing. A core set containing 137 rice accessions with diverse types (landrace, weedy, bred) previously generated from worldwide varieties collected from the National Genebank of the Rural Development Administration (RDA-Genebank, Republic of Korea) using the program PowerCore 41,42 was selected for whole genome resequencing 43 , in which 60 landrace and weedy type rice accessions were isolated for current mt genome analysis (Supplementary Table 9). In addition, 10 wild rice (Oryza rufipogon and Oryza nivara) from the resequencing set developed by Xu, et al. 44 were also combined in the present study (Supplementary Table 9). Raw data of the 10 wild rice accessions were downloaded from the European Nucleotide Archive (http://www.ebi.ac.uk/ena) under accession numbers SRA023116.
For the 60 landrace and weedy rice accessions, young leaves from a single plant were sampled and stored at -80 °C prior to genomic DNA extraction using the DNeasy Plant Mini Kit (Qiagen). Qualified DNA was used for whole-genome resequencing of the collected rice varieties, with an average coverage of approximately 9× on the Illumina HiSeq 2000 Sequencing Systems Platform (Illumina Inc.).

Data manipulation and variations.
Resequencing raw data of all the accessions were trimmed using Sickle v1.2 45 to remove low-quality reads followed by alignment using BWA v0.6.2 46 to the Nipponbare mt genome sequence (Genbank: NC_011033). A Sequence Alignment/Map (SAM) file was created during the mapping and converted to a binary SAM (BAM) file with sorting. Removal of duplicates and addition of read group IDs were performed using Picard Tools v1.88 (http://picard.sourceforge.net/). Final realignment and identification of variation were performed using GATK software v3 47  Mt genome variations architecture. Statistics evaluation of mt genome nucleotide diversity (π), population genetic distance (Fst), and Tajima's D in the whole collection and different groups were conducted using VCFtools 48 across a sliding window 1000 bp in length with a 500-bp step size. Assessments of the calculation were also conducted in different groups to compare the groups divergence.
Population structure analysis. To estimate individual admixture assuming different numbers of clusters, the population structure was investigated using the program FRAPPE (Frequentist Estimation of Individual Ancestry Proportion), which allow for estimation of founding allele frequencies and individual admixture using maximum likelihood estimates 49 . We increased the coancestry clusters spanning from 2 to 6 and ran analysis with 20,000 iterations. With an increasing K value range from 2 to 6, we could investigate the individual ancestry events in different clusters. A neighbor-join phylogenetic tree of the landrace, weedy and wild rice was constructed using PHYLIP package (Phylogeny Inference Package v3.695, http://evolution.genetics.washington.edu/ phylip.html) with 1000 replicates. Principal component analysis (PCA) was conducted using TASSEL 5 50 based on the variations that will provide evidence and complement the population structure analysis.
Mitochondria-based phylogenetic, haplotype network and population splits. ML and BI methods were applied to construct phylogenetic trees for the current collection. Briefly, appropriate nucleotide substitution models were assessed using jModeltest 2.1.7 51,52 . To perform phylogenetic analyses, indels were excluded to eliminate potential errors, as the software does not process indels well. A ML tree was conducted using PhyML 3.0 53 complemented by the best nucleotide substitution model GTR+ G selected by the hierarchical LRT (Hierarchical Likelihood Ratio Test) 54 , the Akaike Information Criterion (AIC) 55 and Bayesian Information Criterion (BIC) 56 with 1000 bootstrap replicates. A Bayesian tree was constructed using MrBayes 3.2.5 57 implemented with a Bayesian MCMC search, with two parallel runs of 2 million generations and four chains each. Best-fit model GTR+ G were selected according to the Bayesian Information Criterion (BIC) 56 and compared with the tree generated by the ML method. The phylogenetic tree was displayed and modified using Figtree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/). The consensus tree of the bootstrap in the ML method was integrated using Phylip software. Tanglegram for two trees was implemented in Dendroscope 58 using a Neighbor Net-based heuristic, which is one good way to visualize similarities and differences between two phylogenetic trees side by side connected with lines between taxa that correspond to each other.
A statistical model for estimating the historical relationships among populations, using a graph representation that allows both population splits and migration events was conducted using TreeMix v1.12 59 . In this model, by using genome-wide allele frequency data and a Gaussian approximation to genetic drift, the structure of the graph that showing that relationships between sampled populations and their ancestral populations was inferred. Haplotype network was conducted using PopART (Population Analysis with Reticulate Trees, http://popart. otago.ac.nz) according to the groups in an Integer Neighbor-Joining method.