Complete chloroplast and ribosomal sequences for 30 accessions elucidate evolution of Oryza AA genome species

Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Yu, Yeisoo; Yang, Kiwoung; Choi, Beom-Soon; Koh, Hee-Jong; Waminal, Nomar Espinosa; Choi, Hong-Il; Kim, Nam-Hoon; Jang, Woojong; Park, Hyun-Seung; Lee, Jonghoon; Lee, Hyun Oh; Joh, Ho Jun; Lee, Hyeon Ju; Park, Jee Young; Perumal, Sampath; Jayakodi, Murukarthick; Lee, Yun Sun; Kim, Backki; Copetti, Dario; Kim, Soonok; Kim, Sunggil; Lim, Ki-Byung; Kim, Young-Dong; Lee, Jungho; Cho, Kwang-Su; Park, Beom-Seok; Wing, Rod A.; Yang, Tae-Jin

doi:10.1038/srep15655

Download PDF

Article
Open access
Published: 28 October 2015

Complete chloroplast and ribosomal sequences for 30 accessions elucidate evolution of Oryza AA genome species

Kyunghee Kim^1,2^na1,
Sang-Choon Lee¹^na1,
Junki Lee¹^na1,
Yeisoo Yu^2,3^na1,
Kiwoung Yang^1,4^na1,
Beom-Soon Choi²^na1,
Hee-Jong Koh¹^na1,
Nomar Espinosa Waminal¹^na1,
Hong-Il Choi¹^na1^nAff12,
Nam-Hoon Kim¹^na1,
Woojong Jang¹^na1,
Hyun-Seung Park¹^na1,
Jonghoon Lee¹^na1,
Hyun Oh Lee^1,2^na1,
Ho Jun Joh¹^na1,
Hyeon Ju Lee¹^na1,
Jee Young Park¹^na1,
Sampath Perumal¹^na1,
Murukarthick Jayakodi¹^na1,
Yun Sun Lee¹^na1,
Backki Kim¹^na1,
Dario Copetti³^na1,
Soonok Kim⁵^na1,
Sunggil Kim⁶^na1,
Ki-Byung Lim⁷^na1,
Young-Dong Kim⁸^na1,
Jungho Lee⁹^na1,
Kwang-Su Cho¹⁰^na1,
Beom-Seok Park¹¹^na1,
Rod A. Wing³^na1 &
…
Tae-Jin Yang¹^na1

Scientific Reports volume 5, Article number: 15655 (2015) Cite this article

4906 Accesses
142 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Cytoplasmic chloroplast (cp) genomes and nuclear ribosomal DNA (nR) are the primary sequences used to understand plant diversity and evolution. We introduce a high-throughput method to simultaneously obtain complete cp and nR sequences using Illumina platform whole-genome sequence. We applied the method to 30 rice specimens belonging to nine Oryza species. Concurrent phylogenomic analysis using cp and nR of several of specimens of the same Oryza AA genome species provides insight into the evolution and domestication of cultivated rice, clarifying three ambiguous but important issues in the evolution of wild Oryza species. First, cp-based trees clearly classify each lineage but can be biased by inter-subspecies cross-hybridization events during speciation. Second, O. glumaepatula, a South American wild rice, includes two cytoplasm types, one of which is derived from a recent interspecies hybridization with O. longistminata. Third, the Australian O. rufipogan-type rice is a perennial form of O. meridionalis.

Comparative and Phylogenetic Analyses of the Complete Chloroplast Genomes of Six Almond Species (Prunus spp. L.)

Article Open access 23 June 2020

Wei Wang, Tao Yang, … Xin-Qiao Xu

Complete chloroplast genome sequence and phylogenetic analysis of wasabi (Eutrema japonicum) and its relatives

Article Open access 07 October 2019

Natsuko Haga, Masaaki Kobayashi, … Kyoko Yamane

Chloroplast genomes in Populus (Salicaceae): comparisons from an intensively sampled genus reveal dynamic patterns of evolution

Article Open access 04 May 2021

Jiawei Zhou, Shuo Zhang, … Xuezhu Liao

Introduction

Plant cells contain three genomes with different evolutionary origins and history: nuclear, mitochondrial and chloroplastic. Chloroplast (cp) genomes and nuclear ribosomal DNA (nR) units are the primary sequences used to analyze plant genetic diversity as well as evolution^1,2 because mitochondrion (mt) genomes show hyper-variable structure in plant genomes³. The cp genomes are 120- to 217-kb circular DNA molecules containing ~30 conserved genes and relatively diverse intergenic spaces (IGSs)^4,5,6,7,8,9 and maintained uni-parentally, usually via maternal inheritance^10,11. Within plant nuclear genomes, nR is organized into highly abundant tandemly-repeated transcription units¹². Due to their conserved roles in ribosome assembly and nucleolus formation, these high-copy nR units have remained homogeneous through concerted genome evolution within species. Four nR gene components usually reside in two independent chromosomal locations, namely the 5S nR (5S) and 45S nR (45S) blocks in higher plants, although some ancient plants such as Ginkgo biloba, moss and algae maintain the 5S and 45S components in one tandem unit^13,14. The 45S blocks include tandemly arrayed copies of the 45S cistron unit, which comprises conserved 18S, 5.8S and 26S gene clusters, relatively variable internal transcribed spacers (ITS1 and ITS2) and IGSs^13,15.

Although next-generation sequencing (NGS) technology has enabled remarkable progress in understanding nuclear genomics, sequencing of cp genomes and nR units has remained in challenging due to their high-copy characteristics. Whereas more than 500 complete cp genome sequences have been reported in GenBank, complete 45S unit sequences are known for only a few species. Most reported cp genome sequences have been obtained by conventional methods^16,17 but recently, several studies have utilized NGS platforms to obtain complete cp genome sequences using isolated chloroplast DNA or reference cp-guided mapping, followed by significant efforts to fill gaps using PCR and Sanger sequencing^{18,19,20,21,22,23}. Recently, there has been some improvement of efficiency in obtaining complete cp genome and nR sequences by optimization of reference-guided mapping using several criteria and also a computing program^24,25,26,27. Reference-guided mapping is good approach for studies of related species with the known reference sequences. However, if related reference genome sequences are lacking, de novo assembly will be the best way to obtain the complete sequences.

Plant whole-genome shotgun (WGS) sequence data produced by NGS technologies always contains cp sequences to various levels, depending on tissue types and extraction methods used for DNA preparation. Here, we have developed an efficient method, named de novo assembly of low coverage WGS (dnaLCW), to assemble those short NGS reads to simultaneously obtain high-quality complete cp genome and nR units. We provide solutions for gap-filling and error correction in sequence assembly without additional efforts such as PCR and Sanger sequencing. We successfully generated new complete cp genome and nR unit sequences for more than 50 species/cultivars with a range of genome sizes. This method greatly facilitates the use of highly informative plastome and nR data to elucidate the evolution of land plants. We have applied this method to our own NGS sequences as well as to publically available NGS sequences for Oryza species.

Rice is the most important staple crop for human consumption worldwide. Cultivated rice includes three species/subspecies, Oryza sativa ssp. japonica (North Asian rice), O. sativa ssp. indica (South Asian rice) and O. glaberrima (African rice), all of which contain the Oryza AA-type nuclear genome. The Oryza AA genome group consists of eight diploid species distributed worldwide: O. barthii (Africa), O. glaberrima (Africa), O. glumaepatula (South America), O. longistaminata (Africa), O. meridionalis (Australia), O. nivara (Asia), O. rufipogon (Asia and Australia) and O. sativa (Asia and now worldwide)²⁸. The AA genome group is estimated to have diverged 2–3 million years ago (MYA)^28,29 and the eight species likely evolved with unique adaptive traits for each native region over time^28,30. Although molecular data have improved our understanding of the phylogenetic relationships and evolutionary history of the Oryza AA genome group, the taxonomical classifications and evolutionary relationships for all eight species in the group have not been not fully resolved.

Here, we report cp and nR sequences for 30 Oryza accessions and describe the phylogenomic relationship of wild and cultivated species of the Oryza AA genome. We also discuss the origin of cultivated rice and some ambiguous issues for classification and evolution of wild Oryza AA genome species.

Results

De novo assembly of low coverage WGS

We used rice reference cultivar ‘Nipponbare’ (NP)³¹ in order to test whether high-copy components such as cp, mt and nR sequences could be assembled from low-coverage WGS data. In de novo assemblies of rice 1x haploid genome-equivalent WGS data, among the 30 longest assembled contigs were 5, 15 and 1 contigs representing cp, mitochondrial (mt) and nR sequences, respectively, with the remaining 9 contigs representing major repeats, mainly transposable elements (TEs) (Fig. 1a and Supplementary Table S1 online). Importantly, the five cp contigs covered the entire 134,551-bp cp genome with approximately 20-bp overlap between adjacent contigs (Fig. 1b). One 6,889-bp contig covered most of the 45S nR unit (i.e. 86%), while 15 contigs (summing to 130 kb) provided partial coverage of the mt genome (i.e. 26%). Similar results were obtained from de novo assembly of 151.5 Mbp Panax ginseng (ginseng) WGS data (0.05x whole genome coverage) where 3, 12 and 1 contigs represented cp, mt and nR sequences, respectively and the remaining 14 contigs were classified as unknown (Fig. 1a and Supplementary Table S2 online). The complete cp genome was covered by three contigs that overlapped slightly (Fig. 1c) and 10 kb mate-pair read mapping showed that the three contigs were ordered properly (Fig. 1d). One 9,423-bp contig represented the 45S unit and 12 contigs (38 kb) represented the mt genome.

Optimization of dnaLCW to obtain complete cp genome sequence

Because we obtained almost complete cp and nR sequences for rice and ginseng with 1x and 0.05x genome equivalent WGS data despite their different genome sizes (430 Mbp and 3,600 Mbp for rice and ginseng, respectively)^31,32, we optimized the WGS dataset size needed to obtain complete cp genome assemblies. As NP and ginseng WGS reads included ~1.7 and 6.0% cp genome-derived reads, respectively, we extracted 10 WGS datasets with between 25x and 5,000x coverage of the cp genome for independent assembly (Supplementary Table S3 online).

We used the number of contigs covering the entire cp genome and the number of assembly errors as criteria for assessment of optimal assembly. Datasets 3–6 with 100x to 250x cp coverage, corresponding to 2–10 x haploid genome equivalents, for rice, showed the best assembly performance for cp genomes, whereas assembly errors and contig numbers in rice rapidly increased when NGS reads reached 20x whole genome coverage (~8.6 Gbp WGS sequence) (Supplementary Table S3 online). This suggests that, with higher amounts of rice input data, short NGS reads originating from nuclear or mitochondrial plastid DNAs (NMPTs; cp sequences inserted into the nuclear or mitochondrial genome) were erroneously co-assembled into cp contigs. The different assembly behavior with regard to input data could be attributable to rice having a higher NMPT content compared to ginseng in which number of assembly errors decreased as increase of cp coverage (Supplementary Table S3 online). Therefore, it is important to use the proper amount of data for assembly to minimize erroneous cp contigs caused by NMPTs.

We compared the performance of two popular genome assemblers, SOAPdenovo³³ and the CLC de novo assembler (http://www.clcbio.com/products/clc-assembly-cell/), in generating small numbers of longer contigs to cover the entire cp genome using various WGS datasets of rice. The CLC de novo assembler outperformed SOAPdenovo (Supplementary Fig. S1 online).

Identification and correction of de novo assembly errors

We could construct a single circular draft cp genome by joining the initially assembled overlapping cp contigs. However, we identified several types of assembly errors, such as the positions denoted by arrows in Fig. 1b,c, by aligning PE reads onto assembled contigs. The mis-assembled regions were typically characterized by accumulation of discordantly mapped reads or abnormally higher read mapping depth. The identified assembly errors included false gaps, false SNPs and copy number errors for TR or monopolymers. We developed detailed in silico methods for identification and correction of each type of error (see Materials and Methods, Supplementary Figs S2-S5 online). We obtained a complete 134,551-bp cp genome sequence for NP that was 100% identical to the reference cp sequence of NP (GU592207), using the dnaLCW approach followed by in silico correction of seven errors detected in the initial assembly.

Obtaining complete sequences for major nR units

The dnaLCW assembly also generated contigs representing the 5S and 45S nR units. The initial 5S contigs contained the complete 5S units of 324 bp and 898 bp for NP and ginseng, respectively (Supplementary Figs S6 and S7 online). By contrast, the 45S contig was represented as incomplete contigs longer than 6 kb, including the main 45S transcriptional unit and part of the flanking IGS. We developed a method to extend the IGS sequences based on the highly homogeneous tandemly arrayed nature of the 45S. We generated a two-unit 45S tandem array using the initial contig and manually inserted 100 unknown nucleotides, (N)₁₀₀, between the two units for the remaining gaps in the IGS (Fig. 2). We then applied iterative gap closing to fill the gaps between the units using Gapcloser with the raw reads. Occasionally, GC-rich regions and sub-repeat elements in IGS made gap-filling ineffective (Fig. 2d,e); however, we successfully obtained representative complete 324-bp 5S and 7,928-bp 45S units from NP that were identical to the 5S and 45S tandem array found in chromosome 11 and 9, respectively (Supplementary Fig. S6 online)³⁴.

Complete cp and nR sequences of 30 Oryza species

We next applied our method to generate complete cp genome and nR sequences for additional 29 Oryza accessions, including five O. sativa cultivars (one japonica cultivar, two indica cultivars and two cultivars derived from an indica x japonica hybrid) and 24 Oryza wild relatives using WGS data produced by four independent groups in Korea, USA, Australia and China (Table 1 and Supplementary Figs S8 and S9 online)^30,35,36,37. The cp genomes varied from 134,296 [O. glumaepatula (IRGC88793)] to 134,678 bp [O. barthii (WAB0028903, WAB0028952)] among the Oryza species and representative InDels between species are shown in Supplementary Fig. S10 online. The complete 5S units varied from 302 to 499 bp due to sequence divergence in the IGS, although the coding sequence was highly conserved (Table 1 and Supplementary Fig. S7 online). The 45S units were 7,745–8,190 bp and sequence variations were more frequent in the IGS region (Table 1 and Supplementary Fig. S11 online).

Table 1 Statistics for assembly of cp and nR sequences from 30 Oryza species.

Full size table

Phylogenomic analysis of cultivated rice including indica-japonica hybrid cultivars and their ancestors

We obtained cp and nR sequences for 30 accessions that belong to nine Oryza species. The nine Oryza species included all eight species belonging to the Oryza AA genome group, as well as O. punctata, the best outgroup Oryza BB-genome species for phylogenomic analysis of AA-genome species^28,29.

The phylogenomic analyses based on the cp genomes and on 45S sequences agreed with each other for the most part, with the exception of the placement of japonica-indica hybrid M23 and one O. rufipogon accession (Fig. 3a,b). The cp genome-based tree clearly distinguished O. sativa subspecies japonica (NP, Yukara) and indica (IR8, TN1) from each other. Cultivars Tongil and M23, bred by crossing O. sativa ssp. japonica and indica, had cp genomes identical to the indica and japonica types, respectively, in accordance with their last maternal parent even though both show indica-like plant architecture and nuclear genome sequence (Fig. 3c,d and Supplementary Figs. S8 and S9 online)^38,39. In the 45S-based phylogenetic analysis, M23, the cultivar derived from japonica x indica hybridization, belonged to the indica group, in accord with its phenotype even though its cp genome was identical to those of japonica cultivars due to maternal inheritance³⁶. Conversely, the O. rufipogon used in this study was grouped with japonica based on 45S, but with indica based on the cp genome (Fig. 3a,b).

The African cultivated rice O. glaberrima and its wild relative O. barthi were grouped together by both cp and nR-based trees (Fig. 3a,b). It was previously reported that O. barthii accessions are divided into five independent subgroups (OB-I to OB-V) and O. glaberrima was domesticated from OB-V⁴⁰. Our cp-based phylogeny showed that all the O. glaberrima accessions are grouped with the O. barthi OB-V group and the 45S-based phylogeny is in accordance with the cp-based tree although the OB-IV and OB-V groups are not distinguished (Fig. 3b).

Phylogenomic analysis of wild Oryza AA genome species

Our concurrent analysis using cp and 45S-based phylogenomic analysis revealed that the cp-based trees can be biased by one event of inter-subspecies or inter-species cross-hybridization during speciation, which was shown by one artificially bred cultivar, M23 developed by hybridization between japonica and indica rice (Fig. 3). Accordingly, we next included several different accessions as representative of each species in analysis to clarify the phylogenomic relationships using cp as well as nR sequences. The topology of the cp-based tree was well in accordance with the 45S-based tree as well as the previous reports with some exceptions (Fig. 4). In addition, phylogenetic analysis based on the maximum likelihood (ML) method (Supplementary Fig. S12 online) produced a similar topology as that in Fig. 4. All accessions belonging to same species were placed into the same clade, with two exceptions. Three O. rufipogon specimens were positioned independently by both cp and 45S trees. Accession no. 7 (Chinese rufipogan) was intermingled with Asian cultivated rice species. Accession no. 8 (Vietnamese rufipogan) was intermediated between Asian and African cultivated rice species. Accession no. 9 (Australian rufipogan) was grouped with Australian wild rice, O. meridionalis. We also analyzed four different O. glumaepatula accessions and the four South American wild rice accessions were grouped as two independent groups by the cp-based tree. Among four O. glumaepatula accessions, nos. 23 and 24 were placed between O. meridionalis (Australian AA) and O. barthi (African AA), however, accessions nos. 25 and 26 were grouped with the basal AA genome species O. longistaminata in the cp-based tree. However, all four accessions were grouped together and independently from O. longistaminata by the 45S tree.

Discussion

The dnaLCW workflow for simultaneous determination of complete cp and nR sequences

Currently, most leading NGS-read assembly programs use a computational algorithm known as a de Bruijn graph and 15 assemblers have been developed to improve genome assembly⁴¹. Repeat sequences generally hinder genome assembly; accordingly, many efforts have focused on removing repeat sequences to avoid the noise they create^42,43,44,45. Conversely, we have developed an efficient workflow to obtain complete cp and nR sequences simultaneously by taking advantage of the high copy genomic elements and using subsequent in silico solutions for error correction (see Materials and Methods). We used standard procedures for DNA preparation, PE library construction and Illumina sequencing. Small amounts of NGS data from WGS reads sufficed to assemble complete cp and nR sequences using our approach.

Origin of cultivated rice species revealed by concurrent cp and 45S-based phylogenomics

Oryza species are self-pollinating. The maternally inherited cp genome often most accurately represents the lineage, whereas nuclear rDNA could be intermingled by chance cross-hybridization between evolving sub-groups. However, our results demonstrated that cross hybridization between diverging groups can also produce altered genotype results in cp genome-based phylogenies, as exemplified by the biased positioning of one japonica-indica hybrid cultivar, M23 (no. 3 in Figs 3 and 4).

During the last four decades, there have been many breeding efforts involving inter-subspecies hybridization between indica and japonica. Two famous rice cultivars, Tong-il and M23, were bred by inter-subspecies hybridization and subsequent inbreeding and selection in Korea (Fig. 3c,d)³⁸. Both cultivars have an indica-type appearance even though a little portion of each genotype is derived from japonica-type rice³⁹. Our cp-based phylogenetic analysis revealed that the Tongil cp genome is identical with indica rice accessions. By contrast, M23 cp genome is identical to those of the japonica accessions rather than those of the indica group, a finding that is contradictory to the 45S-based tree as well as to the phenotype and genotypes (Fig. 3a,b)³⁹. However, this finding is consistent with the breeding history: Tongil likely contains the indica-type cp genome and M23 the japonica-type cp genome because they were maternally inherited from the inter-subspecies hybridization (Fig. 3c,d). Based on our results, we conclude that the cp genome reports genetic diversity well for most plant species; however, the cp-based phylogeny can be sometimes be biased if there was an inter-species cross hybridization event such as in the case of M23. Similar results are observed from some accessions that were estimated to be derived from natural inter-species pollination, one O. rufipogon accession (no. 7) and two O. glumaepatula accessions (nos. 23 and 24) (red lines in Fig. 4). Our data illustrate that phylogenomic analysis based on simultaneous use of both cp and nR sequences will further promote elucidation of the relationships among closely related species over approaches using only one type of sequence.

Genome-wide analysis of large collections of O. sativa relatives has revealed that Asian O. rufipogon genomes are the most diverse and are classified into three major groups, of which Or-I/II and Or-III are thought to be ancestors of the indica and japonica types of rice, respectively and O. rufipogon W1943 accession (no. 7) is classified as Or-III ecotype³⁶, which coincides with our phylogenetic analysis with 45S. However, our cp-based phylogeny indicated that O. rufipogon W1943 is close to the indica type.

Although the 45S-based phylogeny did not classify the OB-IV&-V group, our cp-based phylogeny clearly distinguished the five O. barthii subgroups and demonstrated that the African cultivated rice was domesticated from the OB-V group, in agreement with a previous report (Fig. 3a,b)⁴⁰.

Phylogenetic relationships of Oryza AA genomes

Comparison of genome sequences of Oryza sativa and related five AA genome species revealed the rapid diversification of each species³⁰. Phylogenetic analyses based on 53 conserved nuclear genes had O. meridionalis as the basal species with the AA genome^29,30. By contrast, our analysis showed that O. longistaminata is the basal AA genome species, based on the cp genome as well as nR sequence (Fig. 4). Although there is contradictory positioning of O. longistaminata and O. meridionalis between these two analyses (Our cp-based tree vs. ref. 44), the overall topology is in agreement with other previous reports^{46,47,48,49,50,51}. The cp-based phylogeny also showed very rapid divergence of AA genome species during the last 2.3 million years, in agreement with data based on the nuclear genome (Fig. 4)^29,30.

In this work, we analyzed several different specimens as representatives of the same species, an approach that differs from the other previous studies and promotes better understanding of the divergence of wild species. Our analysis revealed that there were very recent cp genome exchanges between diverging species such as two O. glumaepatula accessions (nos. 23 and 24) which have O. longistaminata-like cp genomes even though the rDNA and phenotypes are similar to two other distinct O. glumaepatula accessions (nos. 25 and 26) (Fig. 4). We assume that there was a cross pollination event with some ancestor of O. glumaepatula (such as accessions nos. 23 and 24) as the male and O. longistaminata as the female between 0.5–0.3 million years ago (MYA) (Fig. 4). As described above, we identified and confirmed a similar phenomenon in one inter-subspecies hybridization-derived cultivar, M23, which was bred by cross-hybridization between O. sativa ssp. japonica x O. sativa ssp. indica according to the breeding history (Fig. 3c).

Three O. rufipogon specimens were placed in different groups. The Chinese O. rufipogon specimen (W1943, no. 7) was grouped with O. nivara and O. sativa. A wild specimen from Vietnam (AC11-1008369, no. 8) was placed independently as basal group of O. sativa indicating that it is a wild ancestor of O. sativa, in agreement with previous reports (Fig. 4a)^36,52,53. However, another wild O. rufipogon specimen from Australia (AC01-1002323, no. 9) was grouped with O. meridionalis, indicating that the naming of the species should be reconsidered, consistent with a previous suggestion that the Australian O. rufipogon may be a perennial form of O. meridionalis⁵².

Advantage of complete cp genome information for barcoding within species

The diversity of the cp genome within the genus or species level is an advantage to analyze domestication of crop plants and for development of barcoding markers for certain cultivars. The origin of domesticated apples has been clarified based on 47 cp genome sequence of Malus species²³. The cp genomes of ginseng and American ginseng (P. quinquefolius) showed 138 SNPs and 40 InDels relative to one another (Supplementary Figs. S13 and S14 online). Although there is abundant polymorphism between different Panax species, previous PCR surveys did not detect polymorphism among P. ginseng cultivars in the cp intergenic regions^54,55. In this study, we could identify one SNP and two InDels between the P. ginseng cultivars ChP and YP (Supplementary Figs. S13 and S14 online) and one SNP and three InDels between ChP and three reported P. ginseng specimens from China (GenBank Accession nos. KC686331, KC686332, KC686333). We further identified a total of six SNPs and six InDels by comparison of nine more complete cp genomes of P. ginseng cultivars or landraces⁵⁶. By contrast, when compared with the previously reported wild P. ginseng cp genome (NC_006290)⁵⁷, our P. ginseng cp genome sequences showed 117 SNPs and 51 InDels, likely due to differences in plant material and/or sequencing errors (Supplementary Fig. S13 online). We also applied this approach to complete cp genomes of tartary buckwheat and three onion accessions to identify useful barcoding markers to classify adjacent species⁵⁸ and different onion cytotypes⁵⁹.

Evolution of nR DNA

The copy number of nR unit varies in different plant genomes. The copy numbers of each nR was estimated based on average depth coverage of 1x genome-equivalent WGS reads (Table 1) and the value was roughly in accord with, but slightly lower than, estimates derived from FISH signals, likely because FISH signals amplify the hybridization signal by rendering a two dimensional signal from the three dimensional chromosome structure (Table 1 and Supplementary Fig. S15 and Table S4 online). We found a remarkable range of estimated copy numbers for 5S and 45S. The 5S copy and 45S copy number estimates ranged from 69–6,045 and 131–2,292, respectively, among Oryza species (Table 1). This raises the question of whether there is any association of rDNA diversity and copy number variation with biological function.

The 5S and 45S units are found in independent chromosomal regions in rice (Supplementary Figs. S6 and S15 online) and in most genomes of higher plants. By contrast, the 45S and 5S units co-exist as one tandem repeat unit in some ancient plants^13,14 and it is assumed that the single unit was divided into separate 45S and 5S units in higher plants during evolution. However, it is not clear how and when the units evolved because of the lack of complete nR unit sequences from diverse plants. Our high throughput approach can thus contribute to revealing how nR evolved in the plant kingdom.

Methods

Preparation of whole-genome NGS reads

Leaf samples were harvested from plants of rice and ginseng grown in a farm of Seoul National University, Suwon, Korea and high-quality genomic DNA was extracted using a modified CTAB method⁶⁰. A paired-end (PE) library with 500-bp insert size was constructed using the Illumina PE DNA library kit according to the manufacturer’s instructions and sequenced using an Illumina Hiseq2000 by the National Instrumentation Center and Environmental Management (NICEM, http://nicem.snu.ac.kr/,Korea) and Macrogen (http://dna.macrogen.com/,Korea) and Illumina MiSeq or NextSeq500 by LabGenomics (www.labgenomics.co.kr, Korea). Illumina Hiseq2000 reads of six O. sativa and eight related Oryza species were provided by Prof. Hee-Jong Koh (Seoul National University, Korea) and the Arizona Genomics Institute (AGI, http://www.genome.arizona.edu/,USA), respectively. WGS sequence data of additional Oryza species were downloaded from the SRA database and used for assembly of cp genomes and nR sequences (Table 1).

WGS assembly and building of complete cp genome and nR sequences

Raw reads with Phred scores of 20 or less were removed from among the total NGS PE reads using the CLC-quality trim tool (quality_trim software included in CLC ASSEMBLY CELL package ver. 4.06 beta. 67189, http://www.clcbio.com/products/clc-assembly-cell/). In assemblies of WGS reads representing more than 70x genome coverage in rice and ginseng, we identified no proper long, unique cp contigs. We then tested assembly of cp genome and nR using low-coverage WGS sequences. Sub-datasets with various levels of cp genome coverage were extracted from trimmed NP and ChP WGS reads and assembled using the CLC de novo assembler included in the CLC ASSEMBLY CELL package or SOAPdenovo included in the SOAP package (ver. 1.12) with default parameters. Sequence gaps were filled by Gapcloser included in the SOAP package (ver. 1.12). Representative contigs for the cp genome or nRs were retrieved from the total assembled contigs using Nucmer⁶¹ with reference sequences. Extracted contigs were ordered and oriented based on built-in BLASTZ analysis (http://nature.snu.ac.kr/tools/blastz_v3.php)⁶² with the cp sequence of related genome and then connected into single draft sequence by joining overlapping terminal sequences.

In silico finishing: Identification and correction of errors in dnaLCW assembly

Tentative error sites were identified by mapping raw reads to draft sequences using the CLC mapping tool (clc_ref_assemble in the CLC ASSEMBLY CELL package) and visualized using CLC viewer (clc_assembly_viewer in the CLC ASSEMBLY CELL package). The mis-assembled error sequences were characterized by accumulation of discordantly mapped reads or abnormally higher read mapping depth. The identified assembly errors were classified as false gaps, false SNPs, or copy number errors for tandem repeats (TR) or monopolymers. Each type of errors was corrected by following in silico manual curation and validated by PCR amplification and Sanger sequencing.

1
False gaps: This type of error occurs in the regions where ambiguous “N” nucleotides present in draft assembly contigs. The left and right sequence flanking an “N” are duplicated, leading to accumulation of commonly mis-mapped reads at the flanking regions (Supplementary Fig. S2 online). Such errors can be corrected by merging the common duplicated sequences flanking the “N” and the correction validated by re-mapping reads on the edited sequence. If the edited sequence is correct, read mapping will show clear matches on the sequence.
2
False SNPs: DNA fragments homologous to those of the cp genome are ubiquitous in mitochondrial and nuclear genomes of rice^63,64 and can interfere with cp genome assembly⁴⁴, leading to create false SNPs (Supplementary Fig. S3a online). Each false SNP could be corrected by assigning the consensus nucleotide sequence to the false SNP location based on the reads showing the highest depth in the paired read mapping, because ~8–100-fold more reads originate from the cp genome than from the nuclear or mitochondrial genome. For example, the assembly of the Os5 dataset, which provides 4x and 200x coverage of the nuclear and cp genomes, respectively, showed two false SNPs, G/T at 51,940 nt and T/A at 51,944 nt (Supplementary Fig. S3b online). The 212 reads mapped to the region revealed clear patterns of origin, in which 186 reads (from the cp) contained T and A nucleotides at those positions, 24 reads (from the mt) contained G and T and 2 (from the nucleus) contained T and T. Overall, false SNPs in the initial contigs can be easily corrected using read mapping followed by assigning the consensus nucleotide with the highest depth.
3
Tandem repeat copy number error: There are many chances for copy number error to arise during de novo assembly using short reads^42,43,45. Our data show that 18-bp TR units were mis-assembled into 2 copies by default assembly options, whereas four complete copies of 18-bp TRs were correctly assembled with using a k-mer length of 64 (Supplementary Fig. S4a online). When repeats are shorter than the read length, increasing the k-mer value above the TR unit length can reduce mis-assembly. Copy number errors in the assembly can be identified by comparing read-depth at the TR and the flanking region. If raw reads map to a region incorrectly assembled with too few copies of a TR, mis-mapped reads will be abundant and abnormal high read-depth can be found at the collapsed regions (Supplementary Fig. S4b,c online). Most TR units found in cp genomes are simple and less than 100 bp, unlike those in the nuclear genome. Therefore, most errors derived from copy number variance of TRs can be fixed.
4
Monopolymer copy number error: A total of 95 and 91 regions contained monopolymer tracts of more than 8 nt in the cp genomes of NP and ChP, respectively. Most monopolymers were poly A or T (Supplementary Table S5 online). Monopolymer regions in the cp genome are susceptible to sequencing errors due to polymerase slippage and mis-assembly caused by interruption of homologous mitochondrial or nuclear sequences containing monopolymers of different lengths. Such monopolymer assembly error was detected at the poly T tract region at 78,424 bp in the NP cp genome (Supplementary Fig. S5a online). Similar sequences with different poly T tracts (7, 8, 9, 10, 11, 12, 15 and 17 nt long) were found in 10 chromosomal regions of the NP nuclear genome (Supplementary Fig. S5b online). The initial assembly of the Os3 dataset generated a (T)₈ monopolymer tract assembly error caused by interruption with T monopolymers derived from sequences of rice chromosome 5, 6, 7 and 9 (Supplementary Fig. S5b online). This error could be corrected by selection of T monopolymer tracts showing the highest read-depth after raw-read mapping on hypothetical T monopolymer sequences with 100% identity. The draft sequence with the correct (T)₁₇ monopolymer among the eight putative sequences showed the highest mapping depth of 33.14, as expected (Supplementary Fig. S5c online).

Annotation and comparative analysis of cp and nR sequence

The cp genome sequence was annotated using the DOGMA program (http://dogma.ccbb.utexas.edu/)⁶⁵ and BLAST searches. Circular and comparative maps of the cp genome were generated using OGDRAW (http://ogdraw.mpimp-golm.mpg.de/)⁶⁶ and mVISTA (http://genome.lbl.gov/vista/mvista/submit.shtml)⁶⁷, respectively. The gene structure of rRNAs, ITS and IGS in assembled 45S sequences were determined by comparison with reported sequences and BLAST searches. Phylogenetic tree construction and the reliability assessment of internal branches were conducted by the neighbor-joining method with 1,000 bootstrap replicates using MEGA6⁶⁸ (Fig. 3). The phylogenetic tree and molecular clock dating was generated based on Bayesian Inference analysis using BEAST (version 1.8.1)⁶⁹ (Fig. 4). We performed 10 million generation of MCMC and sampled every 1000 generations, effective sample size of 592 with parameter as an uncorrelated lognormal relaxed-clock model, with Yule prior on the tree, general time reversible (GTR + I + Γ) as a substitution model and the default priors for generated random starting tree. The BEAST runs were obtained using Tracer (version 1.6) after discarding as burn-in of 10% of generations and the remaining was used to estimate the posterior probability. Numbers on each branches indicate posterior probability (pp) (>0.5). The divergence times were calculated using TreeAnnotator (version 1.8.1) and constrained to be 9 MYA as root age based on recently reported divergence time between Oryza AA genome and BB genome (O. punctata)²⁹.

Validation of polymorphic regions in cp genome sequences

Specific primers were designed from conserved sequences flanking polymorphic regions such as SNPs and InDels found among cp genomes (Supplementary Table S6 online). Genomic DNA was used as template for PCR amplification using Ex-Taq polymerase (Takara, Japan) and the amplified fragments were analyzed using a Fragment Analyzer (Advanced Analytical Technologies Inc., USA), according to manufacturer’s instructions. DNA fragments amplified using dCAPS primers were digested with appropriate restriction enzyme and then separated by a Fragment Analyzer (Advanced Analytical Technologies Inc., USA).

Additional Information

Accession codes: Whole-genome NGS reads used in this study have been deposited in the NCBI Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/) and final assembled cp genome and nR unit sequences from this study were deposited in the GenBank database (for accession number, see Table 1).http://www.nature.com/srep

How to cite this article: Kim, K. et al. Complete chloroplast and ribosomal sequences for 30 accessions elucidate evolution of Oryza AA genome species. Sci. Rep. 5, 15655; doi: 10.1038/srep15655 (2015).

References

Qiu, Y.-L. et al. The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402, 404–407 (1999).
Article ADS CAS PubMed Google Scholar
Soltis, P. S., Soltis, D. E. & Chase, M. W. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402, 402–404 (1999).
Article ADS CAS PubMed Google Scholar
Park, J. Y. et al. Complete mitochondrial genome sequence and identification of a candidate gene responsible for cytoplasmic male sterility in radish (Raphanus sativus L.) containing DCGMS cytoplasm. Theor. Appl. Genet. 126, 1763–1774 (2013).
Article CAS PubMed Google Scholar
Palmer, J. D. Comparative organization of chloroplast genomes. Annu. Rev. Genet. 19, 325–354 (1985).
Article CAS PubMed Google Scholar
Harris, S. A. & Ingram, R. Chloroplast DNA and biosystematics: The effects of intraspecific diversity and plastid transmission. Taxon 40, 393–412 (1991).
Article Google Scholar
Wolfe, A. D. & Randle, C. P. Recombination, heteroplasmy, haplotype polymorphism and paralogy in plastid genes: Implications for plant molecular systematics. Systematic Botany 29, 1011–1020 (2004).
Article Google Scholar
Shaver, J. M., Oldenburg, D. J. & Bendich, A. J. Changes in chloroplast DNA during development in tobacco, Medicago truncatula, pea and maize. Planta 224, 72–82 (2006).
Article CAS PubMed Google Scholar
Rivarola, M. et al. Castor bean organelle genome sequencing and worldwide genetic diversity analysis. PLoS ONE 6, e21743 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, W. & Messing, J. High-throughput sequencing of three Lemnoideae (Duckweeds) chloroplast genomes from total DNA. PLoS ONE 6, e24670 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Wolfe, K. H., Gouy, M., Yang, Y. W., Sharp, P. M. & Li, W. H. Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc. Natl. Acad. Sci. USA 86, 6201–6205 (1989).
Article ADS CAS PubMed PubMed Central Google Scholar
Reboud, X. & Zeyl, C. Organelle inheritance in plants. Heredity 72, 132–140 (1994).
Article Google Scholar
Rogers, S. O. & Bendich, A. J. Heritability and variability in ribosomal RNA genes of Vicia faba. Genetics 117, 285–295 (1987).
CAS PubMed PubMed Central Google Scholar
Wicke, S., Costa, A., Muñoz, J. & Dietmar, Q. Restless 5S: The re-arrangement(s) and evolution of the nuclear ribosomal DNA in land plants. Mol. Phyl. Evol. 61, 321–332 (2011).
Article CAS Google Scholar
Galián, J. A., Rosato, M. & Rosselló, J. A. Early evolutionary colocalization of the nuclear ribosomal 5S and 45S gene families in seed plants: evidence from the living fossil gymnosperm Ginkgo biloba. Heredity 108, 640–646 (2012).
Article PubMed PubMed Central Google Scholar
Álvarez, I. & Wendel, J. F. Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet. Evol. 29, 417–434 (2003).
Article PubMed Google Scholar
Golenberg, E. M. et al. Chloroplast DNA sequence from a miocene Magnolia species. Nature 344, 656–658 (1990).
Article ADS CAS PubMed Google Scholar
Sang, T., Crawford, D. & Stuessy, T. Chloroplast DNA phylogeny, reticulate evolution and biogeography of Paeonia (Paeoniaceae). Am. J. Bot. 84, 1120 (1997).
Article CAS PubMed Google Scholar
Burger, G., Lavrov, D. V., Forget, L. & Lang, B. F. Sequencing complete mitochondrial and plastid genomes. Nat. Protoc. 2, 603–614 (2007).
Article CAS PubMed Google Scholar
Nock, C. J. et al. Chloroplast genome sequences from total DNA for plant identification. Plant Biotech. J. 9, 328–333 (2010).
Article Google Scholar
Zhang, Y. J., Ma, P. F. & Li, D. Z. High-throughput sequencing of six bamboo chloroplast genomes: phylogenetic implications for temperate woody bamboos (Poaceae: Bambusoideae). PLoS ONE 6, e20596 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, Y., Forrest, L. L., Bainard, J. D., Budke, J. M. & Goffinet, B. Organellar genome, nuclear ribosomal DNA repeat unit and microsatellites isolated from a small-scale of 454 GS FLX sequencing on two mosses. Mol. Phylogenet. Evol. 66, 1089–1094 (2013).
Article CAS PubMed Google Scholar
McPherson, H. et al. Capturing chloroplast variation for molecular ecology studies: a simple next generation sequencing approach applied to a rainforest tree. BMC Ecol. 13, 8 (2013).
Article CAS PubMed PubMed Central Google Scholar
Nikiforova, S. V., Cavalieri, D., Velasco, R. & Goremykin, V. Phylogenetic analysis of 47 chloroplast genomes clarifies the contribution of wild species to the domesticated apple maternal line. Mol. Biol. Evol. 30, 1751–1760 (2013).
Article CAS PubMed Google Scholar
Straub, S. C. et al. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing. BMC Genomics 4, 12–211 (2011).
Article Google Scholar
Straub, S. C. et al. Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics. Am. J. Bot. 99, 349–364 (2012).
Article CAS PubMed Google Scholar
Wysocki, W. P. et al. A multi-step comparison of short-read full plastome sequence assembly methods in grasses. Taxon 63, 899–910 (2014).
Article Google Scholar
Weitemier, K., Straub, S. C., Fishbein, M. & Liston, A. Intragenomic polymorphisms among high-copy loci: a genus-wide study of nuclear ribosomal DNA in Asclepias (Apocynaceae). PeerJ. 3, e718 (2015).
Article PubMed PubMed Central Google Scholar
Vaughan, D. A., Lu, B. R. & Tomooka, N. The evolving story of rice evolution. Plant Science 174, 394–408 (2008).
Article CAS Google Scholar
Zhu, T. et al. Phylogenetic relationships and genome divergence among the AA- genome species of the genus Oryza as revealed by 53 nuclear genes and 16 intergenic regions. Mol. Phylogenet. Evol. 70, 348–361 (2014).
Article CAS PubMed Google Scholar
Zhang, Q. J. et al. Rapid diversification of five Oryza AA genomes associated with rice adaptation. Proc. Natl. Acad. Sci. USA 111, E4954–4962 (2014).
Article CAS PubMed PubMed Central Google Scholar
International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793–800 (2005).
Article Google Scholar
Choi, H. I. et al. Major repeat components covering one third of the ginseng (Panax ginseng C.A. Meyer) genome and evidence for allotetraploidy. Plant J. 77, 906–916 (2014).
Article CAS PubMed Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS PubMed PubMed Central Google Scholar
Shishido, R., Sano, Y. & Fukui, K. Ribosomal DNAs: an exception to the conservation of gene order in rice genomes. Mol. Gen. Genet. 263, 586–591 (2000).
Article CAS PubMed Google Scholar
Ammiraju, J. S. S. et al. The Oryza BAC resource: a genus-wide and genome scale tool for exploring rice genome evolution and leveraging useful genetic diversity from wild relatives. Breeding Science 60, 536–543 (2010).
Article Google Scholar
Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Krishnan, S. G., Waters, D. L. & Henry, R. J. Australian wild rice reveals pre-domestication origin of polymorphism deserts in rice genome. PLoS ONE. 9, e98843 (2014).
Article ADS Google Scholar
Chung, G. S. & Heu, M. H. Improvement of Tongil-type rice cultivars from indica/japonica hybridization in Korea. In Biotechnology in Agriculture and Forestry 14, 105–112 (Springer-Verlag, 1991).
Article Google Scholar
Kim, B. et al. Defining the genome structure of “Tongil Rice”, a cultivar for Korean green revolution, derived from indica x japonica hybridization. Rice 7, 22 (2014).
Article PubMed PubMed Central Google Scholar
Wang, M. et al. The genome sequence of African rice (Oryza glaberrima) and evidence for independent domestication. Nat. Genet. 46, 982–988 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fan, W. & Li, R. Test driving genome assemblers. Nat. Biotechnol. 30, 330–331 (2012).
Article CAS PubMed Google Scholar
Phillippy, A. M., Schatz, M. C. & Pop, M. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol. 9, R55 (2008).
Article PubMed PubMed Central Google Scholar
Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Article CAS PubMed PubMed Central Google Scholar
Compeau, P. E., Pevzner, P. A. & Tesler, G. How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29, 987–991 (2011).
Article CAS PubMed PubMed Central Google Scholar
Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2011).
Article PubMed PubMed Central Google Scholar
Iwamoto, M., Nagashima, H., Nagamine, T., Higo, H. & Higo, K. P-SINE1-like intron of the CatA catalase homologs and phylogenetic relationships among AA-genome Oryza and related species. Theor. Appl. Genet. 98, 853–861 (1999).
Article CAS Google Scholar
Cheng, C., Tsuchimoto, S., Ohtsubo, H. & Ohtsubo, E. Evolutionary relationships among rice species with AA genome based on SINE insertion analysis. Genes. Genet. Syst. 77, 323–334 (2002).
Article CAS PubMed Google Scholar
Park, K. et al. Genetic variations of AA genome Oryza species measured by MITE-AFLP. Theor. Appl. Genet. 107, 203–209 (2003).
Article CAS PubMed Google Scholar
Ren, F., Lu, B. R., Li, S., Huang, J. & Zhu, Y. A comparative study of genetic relationships among the AA-genome Oryza species using RAPD and SSR markers. Theor. Appl. Genet. 108, 113–120 (2003).
Article CAS PubMed Google Scholar
Zhu, Q. & Ge, S. Phylogenetic relationships among A-genome species of the genus Oryza revealed by intron sequences of four nuclear genes. New Phytol. 167, 249–265 (2005).
Article ADS CAS PubMed Google Scholar
Duan, S. et al. Phylogenetic analysis of AA-genome Oryza species (Poaceae) based on chloroplast, mitochondrial and nuclear DNA sequences. Biochem. Genet. 45, 113–129 (2007).
Article CAS PubMed Google Scholar
Waters, D. L., Nock, C. J., Ishikawa, R., Rice, N. & Henry, R. J. Chloroplast genome sequence confirms distinctness of Australian and Asian wild rice. Ecol. Evol. 2, 211–217 (2012).
Article PubMed PubMed Central Google Scholar
Zhang, C. et al. High occurrence of functional new chimeric genes in survey of rice chromosome 3 short arm genome sequences. Genome Biol. Evol. 5, 1038–1048 (2013).
Article PubMed PubMed Central Google Scholar
Kim, J. H. et al. Diversity and evolution of major Panax species revealed by scanning the entire chloroplast intergenic spacer sequences. Gent. Resour. Crop Evol. 60, 413–425 (2013).
Article Google Scholar
Jung, J., Kim, K. H., Yang, K., Bang, K. H. & Yang, T. J. Practical application of DNA markers for high-throuput authentication of Panax ginseng and Panax quinquifolius from commercial ginseng products. J. Ginseng Res. 38, 123–129 (2014).
Article CAS PubMed Google Scholar
Kim, K. et al. Comprehensive survey of genetic diversity in chloroplast genomes and 45S rDNAs within Panax ginseng species. PLoS ONE 10, e0117159 (2015).
Article PubMed PubMed Central Google Scholar
Kim, K. J. & Lee, H. L. Complete chloroplast genome sequences from Korean ginseng (Panax schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Res. 11, 247–261 (2004).
Article CAS PubMed Google Scholar
Cho, K. S. et al. Complete chloroplast genome sequence of tartary buckwheat (Fagopyrum tataricum) and comparative analysis with common buckwheat (F. esculentum). PLoS ONE 10, e0125332 (2015).
Article PubMed PubMed Central Google Scholar
Kim, S., Park, J. Y. & Yang, T. J. Comparative analysis of complete chloroplast genome sequences of a normal male-fertile cytoplasm and two different cytoplasms conferring cytoplasmic male sterility in onion (Allium cepa L.). The Journal of Horticultural Science & Biotechnology 90, 459–468 (2015).
Article CAS Google Scholar
Allen, G. C., Flores-Vergara, M. A., Krasynanski, S., Kumar, S. & Thompson, W. F. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 1, 2320–2325 (2006).
Article CAS PubMed Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Article PubMed PubMed Central Google Scholar
Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).
Article CAS PubMed PubMed Central Google Scholar
Bevan, M. et al. Analysis of 1.9Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391, 485–488 (1998).
Article ADS CAS PubMed Google Scholar
Matsuo, M., Ito, Y., Yamauchi, R. & Obokata, J. The rice nuclear genome continuously integrates, shuffles and eliminates the chloroplast genome to cause chloroplast-nuclear DNA flux. Plant Cell 17, 665–675 (2005).
Article CAS PubMed PubMed Central Google Scholar
Wyman, S. K., Jansen, R. K. & Boore, J. L. Automatic annotation of organellar genomes with DOGMA. Bioinformatics. 20, 3252–3255 (2004).
Article CAS PubMed Google Scholar
Lohse, M., Drechsel, O., Kahlau, S. & Bock, R. OrganellarGenomeDRAW-a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Res. 41, W575–581 (2013).
Article PubMed PubMed Central Google Scholar
Frazer, K. A., Pachter, L., Poliakov, A., Rubin, E. M. & Dubchak, I. VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32, W273–W279 (2004).
Article CAS PubMed PubMed Central Google Scholar
Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).
Article CAS PubMed PubMed Central Google Scholar
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank all members of the Laboratory of Functional Crop Genomics and Biotechnology, Seoul National University, Korea and Phyzen (http://www.phyzen.com/), for their technical assistance. This work was carried out with the support of “Cooperative Research Program for Agriculture Science & Technology Development (Project No. PJ01103001, PJ01100801)”, Rural Development Administration, Republic of Korea and partially supported by a grant entitled “The Genetic Evaluation of Important Biological Resources 2012, 2013” from the National Institute of Biological Resources under the Ministry of Environment, Republic of Korea. Preparation of the NGS data sets of wild Oryza species was supported by the US NSF Plant Genome Program’s Oryza Genome Evolution project #1026200 to R.A.W. & Y.Y.

Author information

Hong-Il Choi
Present address: Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
Kim Kyunghee, Lee Sang-Choon, Lee Junki and Yu Yeisoo contributed equally to this work.

Authors and Affiliations

Department of Plant Science, Plant Genomics and Breeding Institute and Research Institute for Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, 151-921, Republic of Korea
Kyunghee Kim, Sang-Choon Lee, Junki Lee, Kiwoung Yang, Hee-Jong Koh, Nomar Espinosa Waminal, Hong-Il Choi, Nam-Hoon Kim, Woojong Jang, Hyun-Seung Park, Jonghoon Lee, Hyun Oh Lee, Ho Jun Joh, Hyeon Ju Lee, Jee Young Park, Sampath Perumal, Murukarthick Jayakodi, Yun Sun Lee, Backki Kim & Tae-Jin Yang
Phyzen Genome Institute, 501-1, Gwanak Century Tower, Kwanak-gu, Seoul, 151-836, Republic of Korea
Kyunghee Kim, Yeisoo Yu, Beom-Soon Choi & Hyun Oh Lee
Arizona Genomics Institute, School of Plant Sciences, The University of Arizona, Tucson, Arizona, 85721, USA
Yeisoo Yu, Dario Copetti & Rod A. Wing
Department of Horticulture, Sunchon National University, Suncheon, 540-950, Republic of Korea
Kiwoung Yang
Biological and Genetic Resources Assessment Division, National Institute of Biological Resources, Incheon, 404-170, Republic of Korea
Soonok Kim
Department of Plant Biotechnology, Biotechnology Research Institute, Chonnam National University, Gwangju, 500-757, Republic of Korea
Sunggil Kim
Department of Horticultural Science, Kyungpook National University, Daegu, 702-701, Republic of Korea
Ki-Byung Lim
Department of Life Science, Hallym University, Chuncheon, Kangwon-do, 200-702, Republic of Korea
Young-Dong Kim
Green Plant Institute, #2-202 Biovalley, 89 Seoho-ro, Kwonseon-gu, Suwon, Republic of Korea
Jungho Lee
Highland Agriculture Research Institute, National Institute of Crop Science, Rural Development Administration, Pyeongchang-gun, Kangwon-do, 232-955, Republic of Korea
Kwang-Su Cho
Department of Agricultural Biotechnology, National Academy of Agricultural Science, Rural Development Administration, Jeonju, 560-500, Republic of Korea
Beom-Seok Park

Authors

Kyunghee Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Choon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Junki Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yeisoo Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kiwoung Yang
View author publications
You can also search for this author in PubMed Google Scholar
Beom-Soon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hee-Jong Koh
View author publications
You can also search for this author in PubMed Google Scholar
Nomar Espinosa Waminal
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Il Choi
View author publications
You can also search for this author in PubMed Google Scholar
Nam-Hoon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Woojong Jang
View author publications
You can also search for this author in PubMed Google Scholar
Hyun-Seung Park
View author publications
You can also search for this author in PubMed Google Scholar
Jonghoon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyun Oh Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ho Jun Joh
View author publications
You can also search for this author in PubMed Google Scholar
Hyeon Ju Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jee Young Park
View author publications
You can also search for this author in PubMed Google Scholar
Sampath Perumal
View author publications
You can also search for this author in PubMed Google Scholar
Murukarthick Jayakodi
View author publications
You can also search for this author in PubMed Google Scholar
Yun Sun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Backki Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dario Copetti
View author publications
You can also search for this author in PubMed Google Scholar
Soonok Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sunggil Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ki-Byung Lim
View author publications
You can also search for this author in PubMed Google Scholar
Young-Dong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jungho Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kwang-Su Cho
View author publications
You can also search for this author in PubMed Google Scholar
Beom-Seok Park
View author publications
You can also search for this author in PubMed Google Scholar
Rod A. Wing
View author publications
You can also search for this author in PubMed Google Scholar
Tae-Jin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.J.Y. designed the research and organized the manuscript. K.K., Jun.L., K.Y., B.S.C., H.I.C., H.S.P., Jon.L., H.O.L., H.J.J., H.J.L., J.Y.P., S.P., M.J. and Y.D.K. participated in assembly of cp genomes and rDNAs, bioinformatic analysis, annotation and validation. H.J.K., Y.Y., N.H.K., W.J., Y.S.L., B.K., Su.K. and K.B.L. conducted sampling, genomic DNA isolation and sequencing. N.E.W conducted FISH analysis. K.K., S.C.L., Jun.L., Y.Y., K.Y., H.J.K., D.C., So.K., Jun.L., R.A.W., B.S.P., K.S.C. and T.J.Y. wrote and revised the manuscript. All authors approved the final manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Com-mons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Kim, K., Lee, SC., Lee, J. et al. Complete chloroplast and ribosomal sequences for 30 accessions elucidate evolution of Oryza AA genome species. Sci Rep 5, 15655 (2015). https://doi.org/10.1038/srep15655

Download citation

Received: 07 July 2015
Accepted: 30 September 2015
Published: 28 October 2015
DOI: https://doi.org/10.1038/srep15655

This article is cited by

The pan-plastome of tartary buckwheat (fagopyrum tataricum): key insights into genetic diversity and the history of lineage divergence
- Jiawei Zhou
- Wenchuang He
- Longlong Liu
BMC Plant Biology (2023)
Evolution of the Araliaceae family involved rapid diversification of the Asian Palmate group and Hydrocotyle specific mutational pressure
- Jong-Soo Kang
- Vo Ngoc Linh Giang
- Tae-Jin Yang
Scientific Reports (2023)
Genome structure and diversity among Cynanchum wilfordii accessions
- Sae Hyun Lee
- Jiseok Kim
- Tae-Jin Yang
BMC Plant Biology (2022)
High-throughput SNP markers for authentication of Korean wheat cultivars based on seven complete plastomes and the nuclear genome
- Woohyeon Cho
- Jin-Kee Jung
- Tae-Jin Yang
Food Science and Biotechnology (2022)
Inheritance of chloroplast and mitochondrial genomes in cucumber revealed by four reciprocal F1 hybrid combinations
- Hyun-Seung Park
- Won Kyung Lee
- Tae-Jin Yang
Scientific Reports (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.