Introduction

Secale cereale ssp. segetale is one of the many species of the genus Secale with a previously unknown chloroplast and mitochondrial genome. However, it can be a source of desired genes (e.g., resistance to diseases, high protein content, morphological and biochemical traits) that can enrich rye or wheat breeding1,2. The lack of knowledge of phylogenetic relationships reduces the progress in rye breeding, which can be enriched with functional features derived from wild rye species3. With new biotic and abiotic stresses and climate change, there is also a need to study wild rye species, which is crucial to improving the yield and quality of this cereal4. Therefore, more genetic markers are needed.. One of the way to achieve this is to sequence complete chloroplast genomes. Due to their conservative and non-recombinant nature, chloroplast genomes are a solid tool in genomics and evolutionary research5. Certain evolutionary hotspots of the plant plastid genome, such as single nucleotide polymorphisms and insertions/deletions, may provide useful information to elucidate the phylogenetic of taxonomically unresolved plant taxa6,7. Thus, the availability of complete chloroplast genomes, which include new variable and informational sites, should help explain more precise phylogeny.

To participate in this effort, we have undertaken the sequencing of the complete chloroplast genomes in genus Secale, which are smaller and easier to analyze compared to mitochondrial genomes. So far, only the incomplete S. cereale cpDNA sequences (NC_021761)8, three sequences for S. strictum (KY636137, KY636138 and OL979486)9 and S. sylvestre (MW557517)10 are available. The chloroplast genome of S. segetale has recently been published11, however a comprehensive phylogenetic analysis based on whole chloroplast genomes has not been done to date. Therefore, we presume that analysis of the complete chloroplast genome sequences of Secale spp., starting with S. sylvestre10, will be useful and cost-effective for evolutionary and phylogenetic studies, as it was suggested by our previous studies12.

In this study, we present the complete chloroplast genome of S. cereale ssp. segetale, which will provide valuable information for genetic studies of Secale species.

Results

Chloroplast genome of Secale cereale ssp. segetale

Sequencing of Secale cereale ssp. segetale chloroplast genome yielded 41 653 350 raw reads, out of which 88 777 were mapped to the reference genome of S. cereale with 97 × average coverage. The S. cereale ssp. segetale cp genome appeared as a typical circular, double-stranded molecule with the length of 137,042 bp (Fig. 1) and overal GC content of 38%. The large single copy (LSC) region is 81,060 bp long, the short single copy (SSC) region is 12,820 bp long, and each of the inverted repeat regions (IR) is 21,581 bp long. Reported cp genome contains 137 genes, including 113 unique genes and 24 genes which are duplicated in the IRs. Group of 113 unique genes features 73 protein-coding genes, 30 tRNA genes, four rRNA genes and five conserved chloroplast open reading frames (ORFs) (Table 1).

Figure 1
figure 1

Map of the chloroplast genome of Secale cereale ssp. segetale. The genes inside and outside the circle are transcribed in the clockwise and counterclockwise directions, respectively. Genes belonging to different functional groups are shown in different colors. Tick lines indicate the extent of the inverted repeats (IRa and IRb) that separate the genomes into small single-copy (SSC) and large single-copy (LSC) regions. The innermost darker gray corresponds to GC-content while the lighter gray corresponds to AT content.

Table 1 Genes present in chloroplast genome of Secale cereale ssp. segetale. Genes list arranged alphabetically.

The LSC region appeared as the most abundant in genes—57 PCGs, 21 tRNA genes and two ORFs (ycf3 and ycf4), whereas there are only ten PCGs and one tRNA gene in SSC. In IR there are four rRNA genes, eight tRNA genes, three ORFs (ycf2, ycf15 and ycf68) and nine PCGs including ndhH located on the junction between IR and SSC region.

Repeat sequence analysis

A total of 52 repeat sequences structures with length ranging from 30 to 286 bp were revealed in the plastome of Secale cereale ssp. segetale (Table 2). The forward repeats (37) dominated over palindromic (15) repeats. Neither complementary nor reverse repeats were found. Most repeat sequences (69.3%) were detected in the LSC region, followed by IR (28.8%) and SSC regions (1.9%). 50% of these sequences were found within coding regions. The highest number of repeats were found within the sequences of the following genes: rpoC2 (9F), rpl23 (2F and 2P) and rps18 (3F and 1P).

Table 2 List of repeated sequences in the chloroplast genome of Secale cereale ssp. segetale.

A total of 29 SSRs were detected in the Secale cereale ssp. segetale chloroplast genome (Table 3). The mononucleotide SSRs composed of A/T units were the most common, whereas hexanucleotide SSRs were not detected. 79.3% of SSRs were located within LSC region, 13.8% in IR region while only 6.9% of SSRs were found in SSC region. Most of the SSRs were identified within intergenic spacers (58.6%), while equal proportions (20.7%) were located in the introns and coding sequences.

Table 3 Distribution of SSR in the Secale cereale ssp. segetale cp genome.

Multigene phylogeny

Phylogeny reconstruction based on sequences of 73 protein-coding genes shared by Secale cereale ssp. segetale and 38 representatives of Pooideae subfamily appeared to be consistent with the systematic position of studied species. The BI and ML tree divided analyzed species into six major clades (Fig. 2). The first cluster contained 23 species representing Triticinae subtribe, four other clades gathered 13 species representing Hordeinae subtribe, whereas the last clad consisted of three Littledalea species (Littledaleeae tribe). Secale cereale ssp. segetale appeared to share the highest degree of similarity with S. cereale and S. cereale ssp. segetale. Mentioned above five Secale species form separate sub-clad within the Triticinae tribe.

Figure 2
figure 2

Cladogram illustrating the phylogenetic relationships for Secale cereale ssb. segetale based on complete cp genome sequences. Phylogenetic tree based on sequences of sheared 73 protein-coding genes from five Secale species and 34 other cereal lineages representing Triticodae group within subfamily Pooidae and the cp genome of Oryza sativa as an outgroup, using Bayesian posterior probabilities (PP) and maximum likelihood (ML). Each node has 100% bootstrap support value. The cpDNA sequence obtained in this study is shown in bold.

Comparison with other complete chloroplast genomes of the Secale species

The overall sequence identity of five cp genomes of Secale species was plotted using mVISTA with the annotation of S. cereale ssp. segetale cp genome (obtained by new sequencing in this study) as reference (Fig. 3). The results showed that the Secale cp genomes exhibited a high level of sequence synteny, suggesting a conserved evolutionary pattern. The plastome sequences were fairly conserved across the four data with a few regions with a variation. The sequences of exons were nearly identical throughout the all taxa.

Figure 3
figure 3

Percentage of sequence identity between chloroplast genomes of Secale cereale ssp. segetale and other four Secale species using mVISTA program. Gray arrows on the top line show transcriptional direction. The y-axis represents average percent identity between sequences of S. cereale ssp. segetale and other three Secale chloroplast genomes. The x-axis represents the coordinate in the chloroplast genome using S. cereale ssp. segetale as reference. Genome regions are color coded as exon, untranslated regions (UTR), and conserved non-coding sequences (CNS).

Discussion

The task of modern cereal breeding is to obtain new, higher-yielding varieties that have high resistance to pathogens, diseases and abiotic conditions. Unfortunately, progress in rye breeding has been limited, as the varieties used in cultivation have had limited variability due to selection. In addition, attempts to use old varieties have been unsuccessful.

A major advance in rye breeding has been the introduction of hybrid varieties, through which individual genotypes are fixed by continuing self-pollination and transferring monogenic traits into varieties13. However, despite the increase in yield, intermediate quality traits are subject to large annual fluctuations. Thus, despite significant increases in grain yield and decreases in protein content in the experiments, increases in grain yield did not significantly positively or negatively affect intermediate quality traits4.

A number of taxa in the genus Secale may represent a potential source of genetic variability in rye breeding3. Species such as Secale strictum and Secale vavilovii may be sources of new genetic variability, with resistance to ear fusariosis and septoria leaf blotch), while Secale vavilovii may also be a source of sterilizing cytoplasm (source of sterilising cytoplasm). Wild rye species and subspecies provide excellent starting material for studies aimed at expanding recombination variability in cultivated rye and triticale (× Triticosecale Wittmark). Because of their genetic distinctiveness and high trait expression, they represent a valuable source of genes in which our cultivars are deficient14. An example is the study of the efficiency of crossing the wild species Secale vavilovii and the rye subspecies Secale cereale ssp. afghanicum, Secale cereale ssp. ancestrale, Secale cereale ssp. dighoricum, Secale cereale ssp. segetale with the crop species Secale cereale ssp. cereale, and the resulting F1 crosses may be a potential source of variation in common rye3. Unfortunately, the lack of knowledge of phylogenetic relationships reduces the progress in rye breeding.

For understanding plant origin and evolution chloroplast genome sequences are very useful. With maternally inherited traits, a genome of relatively small size and a slow mutation rate of the genome15, analysis of the phylogenetic relationships of multiple chloroplast DNA can help understand plant phylogeny, population genetic analysis and taxonomic status at the molecular level16.

Although cp genomes of angiosperm plants are generally conservative in terms of sequence and number of genes17, levels of structural variation have been observed in the genome that vary across families and genera, such as gene duplication and large-scale rearrangements of genes, introns and IR domains (e.g.18,19).

The S. cereale ssp. segetale cp genome appeared as a typical circular, double-stranded molecule (Fig. 1) and overal GC content, which is similar to previously sequenced plastomes of S. cereale (137,051 bp; NCBI LC645358), S. sylvestre (137 116 bp)10 or within the size range of angiosperms20.

The results obtained by Du et al.11 are similar to ours. The size of the genome, the lengths of the LSC, SSC and IR sequences differ slightly. In contrast, larger differences are seen in the number of genes. The genome we analyzed contains 73 protein-coding genes (82 in11), 30 tRNA genes (41 in11) four rRNA genes (8 in11) and five conserved chloroplast open reading frames (ORFs)(lack of information in11).

It is difficult to say where the above-mentioned differences came from. The rich interspecific genetic diversity of S. S. cereale ssp. segetale has been previously reported (e.g.21). Significant differences were found between and within populations of S. c. ssp. segetale. A high degree of genetic variability has also been described using chromosomal markers22,23. These results deserve attention and further research.

The polymorphisms found in S. c. ssp. segetale chloroplast genome sequences can be used e.g. to elucidate evolutionary histories such as the origin of Secale species or accessions at the inter- and, thanks to the research described in this manuscript, intra-species level. Furthermore, the polymorphic sites promote practical applications for molecular analysis to protect S. c. ssp. segetale accession24 and, potentially in the long term, the rye breeding industry. Unfortunately, the analyses of the genome previously published by Du et al. do not include many details, in addition to those mentioned above, which does not allow for a more detailed analysis.

Certain regions of the plastome are predisposed to indel and substitution mutations. Comparative studies of the plastome show the evolution of, among other, tandem repeats and their role in generating substitutions and indels25,26. Once the composition of repeat sequences in the plastome is determined, it is possible to predict microstructural changes by analyzing the correlation between repeats, indels and substitutions. In addition to the paucity of genomic resources, the phylogeny of the genus Secale is enigmatic (e.g.27,28). Therefore, it is important to fully explore the polymorphic regions of Secale chloroplasts in an evolutionary context.

For the total of 52 repeat sequence structures revealed in the Secale cereale ssp. segetale plastome, the vast majority were detected in the LSC region (Table 2). The highest number of repeats was found within the sequences of the rpoC2, rpl23 and rps18 genes. Regardless of its function the rpoC2, gene encoding the β-subunit of plastid RNA polymerase is a relatively rapidly evolving chloroplast sequence29. Analogically, rpl23 gene and its pseudogene which are observed in the grass family belong to highly polymorphic genes considered as a hotspots of illegitimate recombination in cp genomes30.

Chloroplast SSRs identification not only serves as a one of cp genome characteristics but also represent ideal molecular tools with various applications like investigation of domestication history, sites of origin or genetic diversity and relationships between wild and cultivated species31,32. In 2016, Hagenblad et al.33 analyzed the genetic diversity of 76 accessions of wild, feral and cultivated rye based on SNP polymorphisms. They performed an analysis of five chloroplast SSRs, derived from Lolium and wheat. Discriminant analysis of principal components (DAPC) of cpSSR data indicated very large genetic variation within the genus Secale and did not reflect taxonomic groups, except for S. strictum and S. africanum, which formed a separate cluster.

CpSSRs are mainly distributed within intergenic spacers of Secale plastomes; similar distribution preferences of cpSSRs have been reported in Avena spp., Pseudoroegneria libanotica and Salvia miltiorrhiza34,35,36.

Phylogenetic analysis has shown that Secale cereale ssp. segetale appeared to share the highest degree of similarity with S. cereale and S. cereale ssp. segetale. The five Secale species form separate sub-clad within the Triticinae tribe, which confirms previous phylogenetic data of the genus Secale (e.g.37).

The results showed that the Secale cp genomes exhibited a high level of sequence synteny, suggesting a conserved evolutionary pattern. The plastome sequences were fairly conserved across the four data with a few regions with a variation. The sequences of exons were nearly identical throughout the all taxa.

Conclusions

Here we assembled the complete, annotated chloroplast genome sequence of Secale cereale ssp. segetale. The genome is 137 042 base pair (bp) long and contains 137 genes, including 113 unique genes and 24 genes which are duplicated in the IRs. The phylogenetic analysis showed that Secale cereale ssp. segetale appeared to share the highest degree of similarity with S. cereale and S. strictum. Intraspecific diversity has been observed between the published chloroplast genome sequences of S. cereale ssp. segetale. The cp genome will provide a series of resources for evolutionary and genetic studies about species of rye. The assembled genome sequences and annotation information have been deposited in GenBank under the accession number OL688773.

Material and methods

DNA extraction, sequencing, assembly and annotation

Seeds of Secale cereale ssp. segetale introd. no. 1782/94 were obtained from the Botanical Garden of the Polish Academy of Sciences in Warsaw. Total DNA was extracted from young sprouts following Doyle and Doyle38.

The chloroplast (cp) genome of Scecale cereale ssp. segetale was sequenced with the use of DNBseq platform in BGI Shenzhen (China). After the quality check (FastQC tool available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc) the raw reads were mapped to the reference genome of Secale cereale (NC_021761) in Geneious v.R7 software with default medium–low sensitivity settings39. Reads aligned to the reference cpDNA genome were extracted and used for de novo assembly (K-mer—23–41, low coverage cut-off—5, minimum contig length—300). De novo contigs were extended by mapping raw reads to the generated contigs, reassembling the contigs with mapped reads, and manually scaffolding the extended contigs (minimum sequence overlap of 50 bp and 97% overlap identity). This process was iterated five times. Finally, the reduced sequences were assembled in the circular chloroplast genome. The chloroplast genome was annotated using MFannot40 and PlasMapper41 with manual adjustments. The gene map of the annotated cp genome was developed with the OrganellarGenome DRAW tool42.

Repeat sequence analysis

The chloroplast simple sequence repeats (SSRs) were detected using Phobos v.3.3.1243. Only perfect SSRs with a motif size of one to six nucleotide units were considered, the following thresholds for chloroplast SSRs identification were used: ≥ 12 repeat units for mononucleotide SSRs, ≥ 6 repeat units for dinucleotide SSRs, ≥ 4 repeat units for trinucleotide SSRs, and ≥ 3 repeat units for tetra-, penta- and hexanucleotide SSRs44. Analysis of long genomic repeats, i.e. forward (F), reverse (R), palindromic (P) and complementary (C) sequences, was performed using REPuter software45 with the following settings: (1) hamming distance of 3, (2) sequence identity ≥ 90%, and (3) minimum repeat size ≥ 30 bp. A single IR region was used to eliminate the influence of doubled IR regions.

Multigene phylogeny

The phylogenetic position of Scecale cereale ssp. segetale within Triticodae group was also evaluated. For that purpose 73 concatenated protein-coding gene sequences shared with other 38 Pooideae species were used. The cpDNA of Oryza sativa was used as an outgroup (Table 4). For phylogeny reconstruction Bayesian Inference (BI) method was used. The best-fit model of sequence evolution was identified in MEGA v.746, and the GTR + G + I model was selected. The BI analysis was performed in MrBayes v.3.2.647. Parameter settings were previously described by Androsiuk et al.48.

Table 4 List of species used in phylogenetic studies. Species names arranged alphabetically.

For multigene phylogeny maximum likelihood (ML) analyses was conducted using RAxML-NG49 under three different strategies. (1) One of the IR regions was removed from all chloroplast genomes to reduce overrepresentation of duplicated sequences then we run RAxML-NG on the unpartitioned alignment under GTR + I + G substitution model as a single partition; (2) The same data was partitioned by gene, exon, intron and intergenic spacer regions and allowed separate base frequencies, α-shape parameters, and evolutionary rates to be estimated for each; (3) we inferred the best-fitting partitioning strategy with PartitionFinder250 for the alignment. The best fitting nucleotide substitution models were inferred with jModelTest251. Phylogenetic trees were visualized and edited with FigTree 1.4.452. Support for the ML tree branches was calculated by the non-parametric bootstrap method with 1000 replicates.

Comparison with other complete chloroplast genomes of the Secale species

The percentage of sequence identity among complete chloroplast genomes of the five Secale: S. cereale ssp. segetale (OL688773), S. cereale ssp. segetale (LC645358), S. cereale (NC_021761), S. strictum (KY636137), and S. sylvestre (MW557517) was comparatively analyzed and plotted using the program mVISTA53, with alignment algorithm of LAGAN54, a cut-off of 70% identity, and annotation of S. cereale ssp. segetale (OL688773) as reference.

Ethics approval and consent to participate

Authors confirm that the use of plants in the present study complies with international, national and/or institutional guidelines.