Introduction

The origin of the genetic code and the translation event is major transition points in evolutionary biology. The triplet genetic code is hailed as one of the most important and ultimate evolutionary anchors and indisputable evidence of life. The triplet genetic code understands the specific assignment of the amino acids in the translation machinery Cells use the universal manual and a guided dictionary to translate the corresponding amino acids into the translating protein. The number of codon combinations on the mRNA can be an astounding number of feasible protein sequences, from which only a few can be found in nature. The triplet genetic code is believed to be universal and degenerate and accommodates twenty essential amino acids using sixty-one sense and three-stop codons. However, an emerging study has proved that the "Universal Genetic Code" is no longer universal and can be called as canonical1,2. Sometimes nature enhances the protein functionalities through codon reassignment to incorporate new amino acids. This has led to the discovery of the role of selenocysteine (Sec) and pyrrolysine (Pyl) amino acids in the protein through the assignment of the stop codon as the sense codon. However, sense codon reassignment requires low-frequency codons, so stop codons are used for this purpose. It has been demonstrated that except for the triple codons, the Escherichia coli ribosome can accommodate codons and anticodons of variable sizes3.

Taking this opportunity, they have translated four base codon pairs, CCCU, AGGA, UAGA, and CUAG, using the four base anticodons3. Frame-shifting of the + 1 nucleotide is most favorable in the absence of suppressor tRNA in E. coli 3. The study also reveals that frame maintenance during translation is not absolute, and mutant tRNA can promote a frameshift, which can occur with high frequencies at the programmed site of the mRNA4,5. Riddle and Carbon (1973) reported the presence of four base anticodons CCCC in tRNAGly instead of the wild-type CCC6. A study by Mohanta et al.14 revealed the presence of nine nucleotide anticodons instead of seven nucleotides7. These features in the tRNAs certainly explain the existence of extended codons and anticodons. Most likely, these evolutionary scenarios exist in codons and tRNAs to meet the novel translational demand.

The availability of enormous genome sequencing data is quite valuable for digging deep into the molecular features of the protein translation machinery. Significant studies have been performed in the field of codons and tRNAs (anticodons), yet, several things need to be explored. Taking this opportunity, we have conducted a large-scale study to deduce the anticodon table of the chloroplast genome to understand the existence of reduced or extended genetic codes/anticodons in tRNAs. Furthermore, we have also tried to understand the presence of Sec and Pyl tRNAs, which are part of the extended genetic code. Furthermore, an investigation has also been conducted to understand the presence of different introns and the presence of possible spacer tRNA and tRNA fragments.

Results

tRNAs with ACU, CUG, GCG, CUC, CCC, and CGG anticodons are absent in the chloroplast genome

Analysis of the chloroplast genome of the 5959 species from Algae (303), Bryophyte (69), Eudicot (3832), Gymnosperm (153), Magnoliids (182), Monocot (1177), Nymphaeales (34), protist (57), Pteridophyte (139), and unknown (13) led to the discovery of 215,966 tRNA genes. We did not find any tRNA encoded for ACU, CUG, GCG, CUC, CCC, and CGG anticodons from them (Table 1). Furthermore, we also found several anticodons, which were rare in the chloroplast genome. They were AGU (tRNAThr), AAG (tRNALeu), CGC (tRNAAla), UCA (tRNASup), AGG (tRNAPro), AUU (tRNAAsn), UAU (tRNAIle), AUA (tRNATyr), CAG (tRNALeu), CUU (tRNALys), CCU (tRNAArg), AAU (tRNAIle), and GAG (tRNALeu) (Table 1, Supplementary File 1). The tRNA with anticodons AAG, AGU, and CGC was found only once, whereas the tRNA with anticodon UCA, AGG, and AUU was found twice for each (Table 1, Supplementary Files 1, 2). However, the percentage of the CAU (5.47%, tRNAMet) anticodon was the highest among all the 64 anticodons. The abundance of the CAU anticodon was followed by GUU, UGC, ACG, and others (Table 1).

Table 1 Anticodon Table of the chloroplast genome. Study from 5959 chloroplast genomes shows several rare anticodons.

Chloroplast genome encodes tRNA for N-formylmethionine, Ile2, selenocysteine, and pyrrolysine

A study revealed a chloroplast genome was found to encode tRNAs for tRNAfMet, tRNAIle2, tRNASel, and tRNAPyl (Table 1). The tRNAfMet was encoded by the same CAU anticodon that coded tRNAMet. We found 709 (0.33%) genes that encoded tRNAfMet (Table 1). Also, tRNAIle encoded by the CAU anticodon was commonly referred to as tRNAIle2 (Table 1). We found at least 10,575 (4.93%) tRNA genes encoding tRNAIle2 (Table 1). Selenocysteine amino acid was encoded by a previously known stop codon UCA. At least 204 chloroplast genes were found to encode the UCA anticodon for tRNASel (Table 1, Supplementary File 3). A chloroplast genome was also found to encode 197 genes for CUA anticodons that encoded tRNAPyl (Table 1). However, we did not find any CUA anticodon that encoded the suppressor tRNA (Table 1).

Chloroplast genome encodes putative duplet and quadruplet anticodons

We have already mentioned that the triplet genetic code is not universal; it is canonical. Therefore, the genome might have suppressed or extended the genetic code, which is yet to be elucidated, to a greater extent. Our study found that the chloroplast genome encodes the putative duplet and quadruplet anticodons (Supplementary Files 4, 5). The annotation of tRNA with quadruplet anticodon was found when chloroplast genomes were annotated in the GeSeq chlorobox (https://chlorobox.mpimp-golm.mpg.de/geseq.html). However, re-analysis of the tRNA with the quadruplet anticodon in tRNAscan-SE did not result in a tRNA with a quadruplet anticodon, which might be due to the default setting for identification of a tRNA with a triplet anticodon. We are the first to report the presence of duplet and quadruplet anticodons in the chloroplast genome of the plant kingdom. We found that at least 91 species were encoded quadruplet anticodons (Supplementary File 4). The quadruplet anticodons were UAUG, UGGG, AUAA, GCUA, and GUUA (Supplementary File 4). The quadruplet anticodon GUUA found in Gossypium sturtianum (NC_023218.1) putatively encoded tRNAAsn. Similarly, at least 13 species were found to encode duplet (two nucleotides) anticodons in the tRNAs of the chloroplast genome (Supplementary File 5). Among them, there were at least eight putative unique duplet anticodons, namely UG, AG, AU, CA, GA, GG, GU, and UA (Supplementary File 5). The putative duplet anticodons might have been caused by the loss of a nucleotide from the anticodon because, if there were duplet anticodons, the genome could encode only 16 anticodons in its genome and would not be able to accommodate all the 20 coding amino acids in the protein. However, there is a high possibility of having quadruplet anticodons in the tRNAs because, in a quadruplet anticodon table, there are 256 possibilities to encode different amino acids into the protein (Supplementary Table 1).

Parasitic organisms have lost the tRNA genes in their chloroplast genome

We found that some of the chloroplast genomes had lost the tRNA genes. The species that have been found to have lost the tRNA genes are Pilostyles aethiopica (NC_029235.1) (Fig. 1) and Pilostyles hamiltonii (NC_029236.1) (Supplementary File 6). Pilostyles aethiopica and Pilostyles hamiltonii are endoparasitic plants. Furthermore, some other plants have encoded fewer tRNAs in their chloroplast genome (Supplementary File 6). They are Asarum minus (5), Gastrodia elata (5), Sciaphila densiflora (6), Epirixanthes elongata (8), Burmannia oblonga (8), Lecanorchis japonica (8), Lecanorchis kiusiana (9), and Selaginella tamariscina (9) (Supplementary File 6). The mentioned species encoded less than ten tRNA genes in their chloroplast genome. Gastrodia elata is a saprophyte, whereas, Sciaphila densiflora, Epirixanthes elongate, Burmannia oblonga, Lecanorchis japonica, and Licanorchis kiusiana are mycoheterotrophic, and Cystopteris chinensis is an endangered species.

Figure 1
figure 1

OGDRAW map of Pilostyles aethiopica (NC_029235.1) chloroplast genome. The map shows the loss of tRNA genes and inverted repeats.

The chloroplast genome of Asarum minus encoded UUU (tRNALys), UUG (tRNAGln), GCU (TrnaSer), UCC (tRNAGly), and UCU (tRNAArg); Gastrodia elata encoded UUG (tRNAGln), GCA (tRNACys), UUC(tRNAGlu), CAU(tRNAfMet), and CCA(tRNATrp); Sciaphila densiflora encoded UUG (tRNAGln), CAU (tRNAIle), CCA(tRNATrp), CAU(TrnafMet), UUC(tRNAGlu), and GCA(tRNACys); Epirixanthes elongata encoded CCA (tRNATrp), CAU(tRNAfMet), UUG(tRNAGln), GUC(tRNAAsp), GUA(tRNATyr), and UUC(tRNAGlu); Burmannia oblonga encoded UUG (tRNAGln), GCA (tRNACys), GUA (tRNATyr), UCC (tRNAGlu), CAU (tRNAfMet), GUG (tRNAHis), and CAU (tRNAIle); Lecanorchis japonica encoded UUG (tRNAGln), GCA (tRNACys), GUC (tRNAAsp), CAU (tRNAfMet), GAA (tRNAPhe), CAU (tRNAIle), and GUU (tRNAAsn); Lecanorchis kiusiana encoded UUG (tRNAGln), GCA (tRNACys), GUC (tRNAAsp), UUC (tRNAGlu), CAU (tRNAfMet), GAA (tRNAPhe), CAU (tRNAIle), and GUU (tRNAAsn); and Selaginella tamariscina encoded GUG (tRNAHis), GUC (tRNAAsp), GUA (tRNATyr), UUC (tRNAGlu), GUU (tRNAAsn), and CCA (tRNATrp). These species encoded only 14 anticodons CAU, CCA, GAA, GCA, GCU, GUA, GUC, GUG, GUU UCC, UCU, UUC, UUG, and UUU.

Chloroplast genome encodes putative spacer tRNAs

Spacer RNA genes are usually found in bacterial genomes in the spacer region between the 16S and 23S rRNAs. When we focused our study on the spacer RNA in the chloroplast genome, we found that chloroplast genomes were also encoded in the putative spacer tRNAs between the 16 and 23S rRNA genes. tRNAAla (UGC) and tRNAIle (GAU) were the most predominant spacer tRNAs found in the chloroplast genome (Fig. 2). The percentages of the UCG and GAU anticodons in the chloroplast genome were 5.13 and 4.98, respectively. This showed that spacer tRNAs were more common in the chloroplast genome. Sometimes, it contained tRNAfMet (CAU) and tRNASer (GCU) in the spacer region. All the chloroplast genomes did not encode the spacer tRNAs (Supplementary File 7). None of the mycoparasitic plants was found to encode the putative spacer tRNA in their chloroplast genome. However, the majority of the species encoded putative spacer tRNAs.

Figure 2
figure 2

OGDRAWM map of (A) Asparagus officinalis (NC_034777.1) and (B) Populus yunnanensis (NC_037421.1) chloroplast genomes. (A) Asparagus officinalis shows the presence of a putative spacer tRNAs. tRNAAla (UGC) and tRNAIle (GAU) are present between 16 and 23S rRNA in A. officinalis. No putative spacer tRNA is found in the chloroplast genome of Populus yunnanensis.

The majority of chloroplast tRNAs encode group I intron

It was found that the majority of chloroplast-encoding tRNAs encode introns. Except for tRNAArg, tRNAAsn, tRNAAsp, tRNAGln, tRNAHis, tRNAPro, tRNATrp, and tRNAVal all other tRNA genes were found to contain group I introns (Table 2). The introns found in tRNA seem to be isotype-specific (Table 2). The introns are conserved within the tRNA isotype, and the conserved nucleotide sequences of the introns of one isotype do not match with the conserved introns of other isotypes (Table 2). When we cluster the conserved region of the introns, they form four groups (Supplementary Figure 1). We have named them groups A, B, C, and D. Group A contains tRNALeu, tRNATyr, and tRNACys; group B contains tRNASer; group C contains tRNALys, tRNAMet, and tRNAAla, and group D contains tRNAGly, tRNAIle, tRNAGlu, and tRNAThr (Supplementary Figure 1). However, the introns of tRNAPhe do not group with any other introns (Supplementary Figure 1).

Table 2 Conservation of introns in chloroplast tRNAs. From the mentioned tRNA isotypes, at least eight isotypes do not encode any intron in their tRNA genes or its not conserved.

Chloroplast genome encodes putative novel tRNAs

Although we all are well-acquainted with the fact that tRNA makes a clover leaf-like structure, we found some variations in the tRNA structure. Analysis revealed the presence of a few novel tRNA structure/tRNA-like molecules (Fig. 3 and Fig. 4). Some putative novel tRNA-like structures seemed to lack the anticodon loop, whereas, in some cases, they had extra sequences near the anticodon arm region (Fig. 3). A tRNA-like structure contained an extended nucleotide sequence in the region between the D-arm and anticodon arm (Fig. 4). At least 42 species were found to encode novel tRNA-like structures that contained extended nucleotide sequences between the D-arm and anticodon arm (Fig. 4). Furthermore, a few tRNAs were found to have lost the pseudouridine loop (Fig. 5), suggesting the presence of novel tRNAs/tRNA-like structures in the chloroplast genome.

Figure 3
figure 3

Putative novel tRNAs in the chloroplast genome. (A) In the tRNA of Pedicularis ishidoyana (NC_029700.1) there is a long nucleotide sequence present in between the D arm and anticodon arm. (B) In Entransia fimbriata (NC_030313.1) tRNA (tRNALysUUU) a long nucleotide sequence is present in the anticodon loop region that masks the anticodon loop. (C) In Syntrichia ruralis (NC_012052.1) tRNAGlyUCC, a long nucleotide sequence is found in between the D-arm and anticodon arm.

Figure 4
figure 4

Putative novel tRNA of chloroplast tRNA. The tRNA contains a long nucleotide sequence in between the D-arm and anticodon arm. At least 42 chloroplast genomes are found to encode a similar tRNA structure in it. The structure was predicted using the tRNAscan-SE 2.0 program.

Figure 5
figure 5

Fig. 5 shows the presence of putative nematode mitochondrial tRNA in the chloroplast genome. The tRNAs have been seen to lose the Ψ-arm and Ψ-loop. The presence of nematode mitochondrial genome in the chloroplast genome shows that the truncated tRNAs are shared in between the chloroplast and mitochondria. The structure has been predicted using the Aragorn software.

Chloroplast genome encodes putative tRNA Fragments (tRFs)

Analysis revealed the presence of at least 55 tRFs in the chloroplast genome. The tRFs are small 14–32 nucleotides novel class of small, non-coding RNAs, derived from the mature or precursor tRNAs that are different from the tRNA-derived, stress-induced tRNAs (tiRNAs)8,9. The tRFs found were for tRNAGlu, tRNAArg, tRNAGly, tRNAHis, tRNAVal, tRNAIle, tRNAThr, tRNALeu, tRNALys, and tRNAAla (Supplementary File 8). The tRFs of tRNAGlu were found to contain conserved nucleotide sequence GGCCTTATCGTCTAGTGAT, whereas those of tRNAGly were found to contain conserved GCGGGTATAGTTTAGTGGTAAA nucleotides (Supplementary File 8). As such, we did not find conserved nucleotide sequences for the other tRFs. The tRFs of tRNAAla, tRNAGly, tRNAIle, tRNALys, and tRNALeu were 5ˈ-tRFs, whereas the tRFs of tRNAHis, tRNAThr, and tRNAVal were 3ˈ-tRFs. The tRFs of tRNAGlu did not match either the 5ˈ- or 3ˈ-end of the tRNA, which might have originated from the precursor tRNA transcript. Therefore, they can be classified as tRF-1.

Chloroplast genome encodes putative tiRNAs

The longer tRFs (tRNA fragments) of 30–50 nucleotide-long sequences are called tRNA-derived, stress-induced RNAs (tiRNAs)8. Therefore, we searched for the presence of 30–50 nucleotide tRFs. We found at least 244 tRNA sequences, which encoded the 30–50 nucleotides (Supplementary File 9). The tiRFs were part of putative tRNAAla (UGC), tRNAPhe (GAA), tRNAfMet (CAU), tRNAGly (GCC, UCC), tRNAHis (GUG), tRNAIle (CAU, GAU), tRNALys (UUU), tRNALeu (UAA), tRNAAsn (GUU), and tRNAVal (GAC, UAC) (Supplementary File 9). Among them, tiRFs of tRNAHis (GUG) and tRNAfMet (CAU) were found only once, whereas tRNALys (UUU) was the highest (72) encoding tiRF. The tiRFs of tRNALys (UUU) were followed by tRNAIle (GAU) and tRNAAla (UGC), which were found to contain 51 and 52 putative tiRFs, respectively (Supplementary File 9).

Machine machine-learning approach showed GC% influences the tRNA number in the chloroplast genome

We grouped the chloroplast genomes of all the species according to their clade and conducted a comparative study. The analysis revealed that the average tRNA gene number in monocot (37.80%) plants is comparatively higher than in other plants (Supplementary File 6). The protists showed the lowest (29.5%) average tRNA gene number, followed by algae (30.12%) (Supplementary File 6). A correlation analysis of the GC% with the tRNA number showed a positive correlation (r = 0.362) for the monocot clade (Fig. 6). The chloroplast genomes of the species Isolepis setacea and Vitis romanetii were found to encode the highest number of tRNAs, that is, 52 each (Supplementary File 6). On average, the chloroplast genomes were found to encode 36 tRNA genes per genome. A machine-learning approach was used to understand the role of the GC content and genome size in the tRNA number in the chloroplast genome. The boosting analysis revealed that the relative influence of the GC% was more than the genome size (Fig. 7). A principal component analysis was conducted to see their association with different clades.

Figure 6
figure 6

Correlation regression analysis (r = 0.362) of GC % and tRNA gene number in the chloroplast genome. Analysis showed a slight positive correlation between the chloroplast genome’s GC% and tRNA gene number. The analysis was conducted at a p value < 0.05. Correlation analysis was conducted using the JASP 0.16.1.0 version software.

Figure 7
figure 7

Machine-learning analysis of GC % content and genome size in the tRNA gene number in the chloroplast genome; the random forest approach was used to run the analysis. In the study, from 5959 species, 3814 species were used as training sets, 954 for validation, and 1191 as test sets. All the analysis was conducted at p < 0.05. Analysis revealed that the GC% content had more influence toward the number of tRNA gene numbers than the genome size.

Chloroplast tRNAs evolve from multiple common ancestors

We conducted a phylogenetic analysis by considering the tRNA genes of the chloroplast genome. The phylogenetic analysis revealed clear and distinct phylogenetic clusters of tRNAs. The phylogenetic tree showed two major distinct clusters suggesting their origin from multiple common ancestors (Fig. 8). In cluster I, anticodons GCU, GGA, UGA, GCC, UCC, CGU, CGA, GCC, CGA, CGU, UUC, UCU, CAU, UAA, CAA, GUA, UAG, UAU, UAUG, CAA, GCU, UCG, UCU, GAA, CUA, UAG, and GAG, grouped together, whereas, in cluster II, anticodons UUG, GUG, GCA, GAA, UUU, GUU, UGG, GGG, CCA, UGU, GGU, CAU, UAC, GCC, GUC, GAC, GAU, UUC, CGU, ACG, CCG, ACA, and UGC, grouped together (Fig. 8). The anticodons GAA (tRNAPhe), CAU (tRNAMet), GCC (tRNAGly), UUC (tRNAGlu), and CGU (tRNAThr) were shared in both clusters. The phylogenetic analysis of quadruplet anticodons revealed that quadruplet anticodon AUAA shares a phylogenetic relationship with UAUG anticodons, whereas the UGGG and GUUA anticodons fall in a distinct cluster (Fig. 9).

Figure 8
figure 8

Phylogenetic tree of chloroplast tRNAs. The phylogenetic tree shows two distinct major clusters named clusters I and II. The phylogenetic tree shows that chloroplast tRNAs have evolved from multiple common ancestors. In cluster I, anticodons GCC, CGU, CGA, UCU, CAA, and UAG, are found in more than one group, and the anticodons GCC, GCU, UUC, CAU, and GAA are found in both clusters, showing their evolution via duplication. The phylogenetic tree has been constructed using the neighbor-joining method using the Clustal W program.

Figure 9
figure 9

Phylogenetic tree of a putative quadruplet anticodon containing tRNAs with triplet codon–containing tRNAs. The phylogenetic grouping revealed that the quadruplet anticodons had evolved via the addition of a nucleotide preceding the third nucleotide of the triplet anticodons. The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura–Nei model. The tree with the highest log likelihood (−1053.93) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying the Neighbor-Join and BioNJ algorithms to a matrix of pair-wise distances, estimated by using the Maximum Composite Likelihood (MCL) approach and then selecting the topology with the superior log likelihood value. The tree was drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 147 nucleotide sequences. All positions with less than 95% site coverage were eliminated; that is, fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position. There were a total of 26 positions in the final dataset. Evolutionary analyses were conducted using the MEGA 7.

Genes undergo mutation, which is a common phenomenon. Although it was a common phenomenon in coding genes, non-coding genes also showed frequent mutation. Therefore, a transition/transversion bias study was conducted for the chloroplast tRNAs. The analysis revealed that transition predominates transversion (Supplementary Table 2). The transition/transversion bias was found to be the highest for tRNAAsn (R = 13.71), whereas tRNASer (1.22) had the lowest bias (Supplementary Table 2). The transition/transversion bias of tRNAAsn was followed by tRNATyr (11.51) and tRNATrp (8.63). Although tRNAArg, tRNALeu, and tRNASer encoded six Isoacceptors, their transition/transversion bias was comparatively lower than others (Supplementary Table 2).

Discussion

The chloroplast genome harbors several coding and non-coding sequences, including rRNA and tRNA10. These genetic elements and their potential to translate codons make them semi-autonomous organelles of the plant cell. A detailed genomic analysis of the chloroplast tRNA reveals that it does not encode all the 64 anticodons required for the tRNAs. The tRNAs with anticodons ACU (tRNASer), CUG (tRNAGln), GCG (tRNAArg), CUC (tRNAGlu), CCC (tRNAGly), and CGG (tRNAPro), are absent in the chloroplast genome of the studied species. Therefore, these anticodons can be classified as rare anticodons of the chloroplast genome. The ACU anticodon of tRNASer and the GCG anticodon of tRNAArg are from the hexa-isoacceptor group. In contrast, the CCC anticodon of tRNAGly and the CGG anticodon of tRNAPro are from the tetra-isoacceptor group. Therefore, a lack of these anticodons from their isoacceptor group does not make any difference in the genome, as other isoacceptors are available for their use to encode the codon. However, tRNAGln is encoded only by CUG and UUG anticodons, whereas tRNAGlu is encoded by the CUC and UUC anticodons. The lack of the CUG anticodon from tRNAGln and the CUC anticodon from tRNAGlu in the chloroplast genome has left these tRNA isotypes with only one choice of anticodon (Table 1). The lack of the CUG anticodon in tRNAGln and the CUC anticodon in tRNAGlu in the chloroplast genome may be due to a strong selection pressure to establish UUG (tRNAGln) and UUC (tRNAGlu) anticodons as the dominant anticodons. The tRNA anticodons followed by nucleotides CUx (x = any nucleotide) may have undergone a strong evolutionary pressure. Hence anticodons CUA, CUU, CUG, and CUC, encode only 197, 7, 0, and 0 anticodons, respectively, in the chloroplast genome (Table 1). However, the CAU anticodon encoding tRNAMet has been seen to have the highest percentage (5.47%) in the chloroplast genome (Supplementary File 1). The CAU anticodon of tRNAMet, of the nuclear-encoded genome has also been found in the highest (5.03%) abundance11, thus corroborating CAU, as the most abundant anticodon in the nuclear and chloroplast genomes. The anticodons CAU (tRNAMet), GUU (tRNAAsn), UGC (tRNAAla), and ACG (tRNAArg) have been found to encode more than 5% each of the total anticodons, suggesting the role of positive selection pressure in these anticodons (Supplementary File 1). However, at the isotype/isodecoder level, tRNALeu (10.27%) has been found to contain the highest percentage of anticodons, followed by tRNAIle (9.93%) and tRNAArg (7.96%) (Supplementary File 1). A similar level of abundance has been found for tRNALeu (7.80%) for the nuclear-encoded tRNA genes, reflecting a similarity in the anticodon abundance in the nuclear and chloroplast genomes11. However, an abundance of the nuclear-encoded anticodons tRNALeu is followed by tRNASer (7.66%), tRNAGly (7.52%), and tRNAArg (7.28%)11. Although tRNALeu is the highest encoding isotype/isodecoder in nuclear- (7.80%) and chloroplast (10.27%)-encoded genomes, there is a great difference in their percentage. The chloroplast-encoded CAU anticodon also encodes tRNAIle2 (4.93%). The CAU anticodon for tRNAfMet (0.33%) is also relatively abundant in the chloroplast genome. The tRNAfMet acts as an initiation anticodon in protein synthesis in mitochondria, bacteria, and chloroplasts, and the presence of tRNAfMet in the chloroplast genome is quite justified. However, only 709 tRNAfMet genes were found during the analysis suggesting that tRNAfMet is not a universal tRNA of the chloroplast genome. A majority percentage of the chloroplast genome does not encode tRNAfMet. A few chloroplast genomes encode the tRNAs for selenocysteine and pyrrolysine amino acids (Table 1). However, Zhao et al.12 has reported the absence of tRNASec in gymnosperm plants. The Sec amino acid specified by the UGA codon requires the presence of the selenocysteine insertion sequence (SECIS) element, and the Pyl amino acid encoded by the UAG codon requires the pyrrolysine insertion sequence (PYLIS)13. The presence of tRNA for encoding Sec and Pyl reflects that the chloroplast genome may have SECIS and PYLIS.

It was also very peculiar to see the loss of tRNA genes in the chloroplast genome of heterotrophic and mycoparasitic plants. Our previous study reported the loss of several other genes in the chloroplast genome in mycoparasitic and heterotrophic plants14. A similar is true for the tRNA genes as well. In the absence of tRNA genes in the chloroplast genome, the cell most probably uses the tRNA genes from the nuclear-encoded genome. However, the loss of tRNA genes in the chloroplast genome seems independent of the nuclear genome. The parasitic and heterotrophic plants require less effort to complete their lifecycle, as they are completely dependent on their host. Hence, they do not need a lot of genes for their function and hence, may be under constant pressure to eliminate genes. Therefore, these mycoparasitic and heterotrophic plants contain only 14 (CAU, CCA, GAA, GCA, GCU, GUA, GUC, GUG, GUU UCC, UCU, UUC, UUG, and UUU) anticodons in their chloroplast genome.

It is well known that the triplet genetic code is canonical and not universal15. The genetic code can be expanded, where specific codons can be reallocated to encode non-proteogenic amino acids. The tRNA genes undergo rapid changes to meet the translational demand of the cell16. Therefore, it is highly possible that tRNA can expand its anticodon nucleotide number. Our study helped us to discover the presence of quadruplet anticodons in the chloroplast genome of 91 plant species (Supplementary File 4). The quadruplet anticodons found in our study were UAUG, UGGG, AUAA, GCUA, and GUUA. Studies regarding functional quadruplet anticodons are reported in a few cases17,18,19,20,21,22,23,24,25. DeBenedictis et al., (2022) reported the translation of four-base codons in natural and synthetic systems25. They reported 20 isoacceptors can be changed to functional quadruplet tRNAs25. Anderson et al.17 reported the role of the quadruplet codon AGGA through changes in the tRNA anticodon loop to CUUCCUAAA in a suppressor tRNACUA. The suppression of the amber tRNA led to the encoding of homoglutamine (hGln), using the AGGA codon17. They also reported that quadruplet codons CCCU or CUAG could suppress the amber tRNA and allow the incorporation of unnatural amino acids into the protein in Escherichia coli17. Neumann et al.18, reported the encoding of unnatural amino acids through the evolution of the quadruplet anticodon in response to the amber codon tRNACUA. Chloramphenicol resistance was achieved when tRNAUCUUSer2 translated the AAGA codon, and tRNAUCCUSer2 translated the AGGA codon18. Niu et al.19 replaced tRNAPylCUA with the UCCU anticodon and generated tRNAPylUCCU, which recognized and suppressed the quadruplet codon AGGA. This provided a qualitative notion for suppressing the quadruplet codon through tRNAUCCU19. Most specifically, the presence of the quadruplet anticodon was associated with the suppression of the amber tRNA and the incorporation of the unnatural amino acid into the protein chain. The tRNAGCUA contained an additional G nucleotide prior to the tRNACUA anticodon, suggesting its role in the suppression of the amber codon. In the tRNAAsnGUUA anticodon, most probably, nucleotide A was incorporated after the GUU anticodon, as the tRNA with the GUU anticodon was grouped with the GUUA anticodon in the phylogenetic tree (Fig. 9). Similarly, in the UGGG anticodon, the G nucleotide got incorporated in the UGG anticodon, as they grouped with the UGG anticodon (Fig. 9). The GCUA anticodon was grouped with GCU anticodon, suggesting that the A nucleotide was incorporated at the fourth position of the GCU anticodon, which gave rise to the GCUA anticodon (Fig. 9). However, no such clue was found in the case of the UAUG and AUAA anticodons. Considering the incorporation of the additional nucleotide at the fourth position, we could speculate that the G nucleotide was most probably incorporated in the UAU anticodon and gave rise to the UAUG anticodon. Similarly, the A nucleotide was incorporated at the fourth position of the AUA anticodon to give rise to the AUAA anticodon. Although we found only five putative quadruplet anticodons, the genome could accommodate at least 256 quadruplet anticodons/codons in the cell (Table 2). We also found the presence of tRNAs, with only duplet anticodon, where one nucleotide was possibly deleted from the anticodon (Supplementary File 5). At least 13 species contained duplet anticodons in the tRNA of the chloroplast genome (Supplementary File 5).

The chloroplast encoding tRNAs were also found to encode the group I introns. These group I introns was conserved in their respective isotype/isodecoder groups (Table 2). From a total of 20 isotypes, 12 of them were found to encode the group I introns (Table 2). However, the group I intron of one isotype was not conserved with the intron of another isotype, reflecting the isotype-based conservation of the group I intron, in the tRNA.

It is well-reported that group I introns are found in tRNAs, bacteria, lower eukaryotes, and higher plants26,27,28. Some of the group I intron encode homing endonucleases catalyze intron mobility, thus facilitating the movement of the intron from one location to another and from one organism to another27. However, the incorporation of the group I intron in the tRNA gene is isotype-specific, as only 12 isotypes have been found to encode the intron, while eight isotypes do not have any intron in their tRNAs (Table 2). From the eight isotypes, tRNAHis, tRNAGln, tRNAAsp, tRNAAsn, and tRNAArg belong to the polar group, whereas, tRNATrp, tRNAPro, and tRNAVal belong to the non-polar group. This shows that the presence of the type I intron tends to be more toward the tRNA that encodes polar amino acids. Furthermore, it is seen that the chloroplast genome also encodes the putative spacer tRNAs (Fig. 2). It is reported that E. coli contains a spacer tRNA (tRNAAla and tRNAIle) that is present in the spacer region of the 16S and 23S rRNA29. The tRNAs, tRNAAla, and tRNAIle, have also been found in the spacer region of 16S and 23S rRNA, suggesting the presence of a spacer tRNA in the chloroplast genome. Although, in a majority of cases, tRNAAla and tRNAIle are the predominant spacer tRNAs; tRNAGlu can be the third most possible spacer tRNA of the chloroplast genome.

The analysis also revealed the presence of tRNA fragments (tRFs) in the chloroplast genome. We found at least 55 tRFs that belonged to ten tRNA isotypes (Supplementary File 8). These tRFs were putatively derived from the tRNA precursors or from the cleavage of mature tRNAs30. The tRFs were reported to control gene expression, translation control, transposon control, ncRNA, and DNA damage response8,30,31,32. Although we found ten different chloroplast-derived tRFs, the majority of them belonged to tRNAGlu and tRNAGly (Supplementary File 8). Among them are the, tRNAGluare tRF-1 type, tRNAGlyare tRF-5ˈ-type, and tRNAHis, tRNAThr, and tRNAValare tRF-3ˈ type (Supplementary File 8). Furthermore, we also noted the presence of a few putative tRNA-derived, stress-induced RNA (tiRNAs) fragments (tiRFs) in the chloroplast genome. The majority of the tiRFs were from tRNALys (UUU). For the first time, tiRFs were reported in the human fetus hepatic tissue and osteosarcoma cells33,34. These tiRFs could be generated in the cell under different stress conditions via cleavage of mature tRNAs33. However, their presence as independent nucleotide fragments in the annotated genome sequence reflected their independent presence in the genome. Although the cleavage of tRNAs to tiRFs was brought about by the enzyme angiogenin (an RNase superfamily)34 in the human cell, its counterpart in plants needs to be identified to understand its detailed functions. The 5ˈ-tiRNAAla and tiRNACys were reported to inhibit translation in rabbit reticulocytes34 suggesting their inhibitory role in protein translation.

This study also found a putative novel tRNA structure encoded by the chloroplast genome (Fig. 4). The tRNAGly (UCC) was found to contain a long nucleotide sequence between the D-arm and anticodon arm in several species. This long arm could most probably be an intron that might have been incorporated in between these two arms. The chloroplast tRNAs, which had lost the pseudouridine loop (Ψ), seemed to be metazoan mitochondrial-specific (Fig. 5). The loss of the Ψ-loop in tRNA was first reported in the 1970s35,36,37. Previous studies also reported the loss of the Ψ-arm and loop in nematode mitochondrial tRNA37. However, in the nematode mitochondrial tRNA, the Ψ-arm and loop were present in the tRNASer (GCU), whereas it had lost the Ψ-arm and loop in tRNASer (GCU and GGA) in the chloroplast genome (Fig. 5). The elongation factor (EF) Tu combined with GTP to form a complex that delivered the amino-acyl tRNA to the ribosome A site through binding of the acceptor,s arm and Ψ-arm38. In the absence of the Ψ-arm and loop in the tRNAs, it might use some alternative binding mode for EF-Tu39,40. Caenorhabditis elegans mitochondrial EF-Tu, it has around 60 amino acid extensions at the C-terminal end that might play an essential role in binding tRNAs that lack the Ψ-arm41,42. This also suggested that the mitochondrial ribosomal protein might have alternate binding sites for the truncated tRNA. Furthermore, the metazoan, mitochondria-specific, truncated tRNA in the chloroplast genome suggested that these tRNA genes might be shared by sub-cellular organelle chloroplast and mitochondria.

Evolutionary analysis revealed that chloroplast tRNAs are derived from multiple common ancestors (Fig. 8). The phylogenetic tree of the chloroplast tRNA shows two distinct clusters, which reflect their evolution from multiple common ancestors. In cluster I, anticodons GCC, CGU, CGA, UCU, CAA, and UAG make more than one group, whereas none of the anticodons from cluster II are found to make more than one group (Fig. 8). The anticodons GCC, GCU, UUC, CAU, and GAA are also found in both clusters (Fig. 8). This suggests that tRNAs with anticodons GCC, CGU, CGA, UCU, CAA, and UAG, of cluster I may have undergone vivid duplication and produced more than one anticodon group.

Conclusions

Chloroplast is a semiautonomous organelle of the plant and protist kingdom with a great potential to encode its own genome and protein translation machinery. The important tRNA molecules required for the protein translation process are well documented. The chloroplast genome encodes putative duplet, triplet, and quadruplet anticodons suggesting their role in recognizing duplet, triplet, and quadruplet codons in the mRNA. Mycoparasitic plants have lost their chloroplast genome to a large extent, thereby losing several chloroplast-encoded tRNA genes. Further, several of the chloroplast-encoded tRNA genes were found to encode introns, and the presence of intron in the chloroplasts genome suggests the presence of introns in the gene of their prokaryotic ancestor cyanobacteria. Further, the chloroplast genome is very selective and encoded only a few isoacceptors abundantly, while GCG, CUG, CUC, CCC, CGG, and ACU anticodons were found to be the rarest form of anticodons in the chloroplast genome. It is important to understand why chloroplast genomes do not encode tRNA with such anticodons. The tRNAs with quadruplet anticodons will enable us to provide a platform for the synthetic biologist to engineer tRNAs with quadruplet anticodons to understand the expansion of quadruplet genetic code.

Materials and methods

All the chloroplast genomes were downloaded from the National Center for Biotechnology Information (NCBI) database. In total, 5959 chloroplast genomes were used in this study. The downloaded chloroplast genomes were subjected to tRNA annotation. tRNA annotation was conducted using tRNAscan-SE 2.0, Aragorn, and the GeSeq-Annotation of the organellar genomes43,44,45. The Linux-based approach was used to annotate the chloroplast tRNA for tRNAscan-SE 2.0 and Aragorn. In the GenSeq-annotation of the organellar genome, the chloroplast genome files were uploaded with the following parameters; sequence source: plastid; annotation option: Annotate plastid inverted repeats; blat search: Default; annotate: CDS, tRNA, and rRNA; and third-party tRNA annotator: Aragorn v1.2.38, tRNAscan-SE v2.0.7. All the tRNA sequences generated from these three annotation pipelines were corroborated and used for further analysis. All the data obtained from tRNAscan-SE and Aragorn were further processed in an excel worksheet. The Organellar Genome Draw (OGDRAW) was used to draw the organellar genome map of the chloroplast genome46. The Genbank file was used to draw the chloroplast genome map in OGDRAW46.

Multiple sequence alignment

The intron sequences retrieved from the chloroplast tRNA were aligned to find the possible conserved structure. Multiple sequence alignment was conducted using the Multalin software (http://multalin.toulouse.inra.fr/multalin/) that uses hierarchical clustering47. Default parameters were used to construct the alignment.

Machine learning approach and statistical analysis

A machine learning approach was used to understand the role of the genome size and GC% content in the number of tRNA genes in the chloroplast genome. The random forest regression approach was used for this purpose. The following parameters were used in the random forest analysis: target tRNA gene number, predictor’s genome size, and GC% content; Plots: data split, out-of-bag error, predictive performance, the mean decrease in accuracy, and the total increase in node purity; tables: evaluation matrix; data split preference: sample 20% of all data; training and validation of data: 20% validation data. The training parameters were as follows; training data used per tree: 50%; predictor per split: auto; and the max tree: 100%. The machine-learning approach was studied using the JASP software version 0.16.1.048. The correlation plot for GC% content and tRNA was also conducted using the JASP 0.16.1.0 software. The following parameters were used for the correlation analysis, sample correlation coefficient: Pearson’s r and confidence interval: 95% (p < 0.05)48.

Phylogenetic tree

The tRNA sequences of the chloroplast genomes were taken to construct the phylogenetic tree. The phylogenetic tree was constructed using the Clustalw program (version 2.1) in a Linux-based environment. A neighbor-joining tree was constructed with 100 bootstrap replicates. The resulting file was saved in nwk file format and later uploaded in the iTOL Interactive Tree of life (version 6) to view tree49. The phylogenetic tree of the tRNA quadruplet anticodons, with other anticodons, was constructed using the MEGA software version 750. Prior to the construction of the phylogenetic tree, the tRNA sequences were subjected to multiple sequence alignments. Multiple sequence alignments were conducted using the MUSCLE software version 151. The resulting clustal file was converted to the MEGA file format (aln) using the MEGA 7 software50. The converted file was subjected to construct the phylogenetic tree in the MEGA 7 software, using the maximum-likelihood approach. The phylogenetic tree of the tRNA introns was also constructed using the MEGA 7 software with the same statistical parameters50. The Tamura-Nei model, with 500-bootstrap replicates was used for the analysis.