Introduction

The presence of organelles is a distinguishing feature of all eukaryotes. Of the organelles evolutionarily acquired by symbiosis and thus retaining independent genomes, the mitochondrion exists in nearly all extant eukaryotes. The pared-down mitochondrial genome generally retains a remarkably similar core subset of a few coding loci and rRNAs across eukaryotic species. However, the overall organization of these mitochondrial genomes, and the mechanisms for expression and processing of their RNA products vary widely.

Circular RNAs (circRNAs) are a class of RNAs in which the 5′ and 3′ ends of a single RNA molecule are covalently ligated. They were first reported as viroids1 but soon after discovered in the cytoplasm of HeLa cells2 and yeast mitochondria3. Once thought rare, circRNAs’ ubiquity in nature is being increasingly recognized4,5,6. The precise mechanism of ligation remains an active area of research, and different mechanisms may operate in different cellular contexts. Ribozymes, cellular ligases, and spliceosomes have all been shown to mediate RNA circularization7. Most identified circRNAs are formed from eukaryotic mRNA introns or exons spliced at normal splice sites and subsequently ligated6. However, a 2018 report revealed the presence of circRNAs formed from full-length mitochondrial mRNAs in the unicellular alga Chlamydomonas reinhardtii8. The mitochondrial mRNA of this alga, like that of a mammalian cell, lacks a 5′ UTR and thus a ribosome binding site. The authors speculated that circularization of Chlamydomonas mitochondrial mRNAs provided a mechanism for ribosome binding upstream of the translational start site. The same group subsequently explored whether such a mechanism might also exist in the mammalian mitochondrion9. This was determined to be unlikely, because while circRNAs derived from Chlamydomonas mitochondrial mRNAs were full length with a termination codon, those from mammalian cells were truncated and thus non-translatable. Very recently, widespread mitochondrial circRNAs originating from mRNAs were discovered in plants, and some were shown to be ribosome associated10. The possibility that diverse regulatory roles for mitochondrial circRNAs exist has recently been confirmed with studies demonstrating that mitochondrial circRNAs regulate protein import into the mitochondria of metazoans and the output of mitochondrial reactive oxygen species in human cells11,12. Despite these few discoveries of functional roles for mitochondrial circRNAs, their functions and their associated mechanisms remain largely elusive. Further, the landscape of mitochondrial species of circRNAs among eukaryotic species across the tree of life has been barely scratched. The possibility of functional mitochondrial circRNAs in less-studied eukaryotic species is intriguing.

Trypanosoma brucei is an insect-transmitted parasitic protozoan of mammals that is part of a larger group called the kinetoplastids. These flagellates possess one unique mitochondrion per cell with a concatenated network of DNA and associated proteins called the kinetoplast. The kinetoplast (k)DNA contains 20 genes: 2 rRNAs and 18 mRNAs that mostly code for proteins of mitochondrial electron transport chain complexes. The process of kDNA transcription into RNA and subsequent RNA maturation steps have been extensively studied and a working model has emerged13. Evidence suggests that transcription is monocistronic14, contrary to previously held hypotheses of polycistronic transcription15, but that there may be a degree of both cannot be ruled out. After transcription, T. brucei RNAs may possess a 5′ triphosphate, which on mRNAs is converted to a 5′ monophosphate by the 5′ pyrophosphohydrolase complex (PPsome)14. All RNA is processed by the mitochondrial 3′ processome (MPsome), which trims the 3′ end and can add oligouridine (U) tails16. mRNAs generally have short adenine (A)-rich A/U heterogenous tails added by kinetoplast poly(A) polymerase I (KPAP1) as part of the kinetoplast polyadenylation complex (KPAC) in combination with KRET1 of the MPsome. Interactions between the 5′ and 3′ bound protein complexes are hypothesized to create a molecule with protected 5′ and 3′ ends in close proximity and are a key regulatory step against mis-tailing and extensive 3′ trimming13. These interactions are thought to be mediated and possibly coordinated by the RNA editing substrate-binding complex (RESC)13. The RESC is one of the main protein complexes that is involved in a complicated editing process that takes place for 12 of the 18 mRNAs. This unique uridine insertion/deletion RNA editing requires a multitude of proteins, many of which are found in the RESC and the RNA editing catalytic complex (RECC). After editing is complete, the short A-rich mRNA tail is elongated to a more heterogenous A/U tail before translation takes place. A multitude of RNA binding proteins control this process, including 3′ trimming, tailing, and editing.

Here, we present evidence of mitochondrial mRNA-derived circRNAs in the kinetoplastids that are very evolutionarily distant from both plants and metazoans. Mitochondrial circRNA of multiple mRNA transcripts are present in both the insect and mammalian bloodstream life stage forms of T. brucei. Using circular RT-PCR and deep sequencing, we confirmed that the 3′ and 5′ ends of these mRNAs are covalently linked together. We have investigated trends in the UTRs and tails that these transcript populations contain and show that for each transcript examined, its circRNAs are a distinct subpopulation of total mRNA that possess different tail and 5′ UTR characteristics. We hypothesize that the differences between a transcript’s linear and circular mRNA populations are an indication of an alternative pathway in the mRNA lifecycle differing from its normal maturation and turnover pathways.

Results

Some mitochondrial mRNA are circularized

CircTAIL-seq is a technique used to Illumina sequence individual transcript PCR libraries of mitochondrial mRNA tails. The approach captures enough of each molecule’s 5′ and 3′ termini that the presence of editing can be confirmed if necessary17. Library preparation first requires circularizing total RNA with T4 RNA ligase prior to the generation of cDNA, after which outward facing primers (“Circular primers” in Fig. 1A) can generate PCR products capturing ligated 5′-3′ junction sequence (termed ligation point junction sequences). Individual sequence reads thus consist of the 3′ end of a transcript, its 3′ UTR, any 3′ tail, 5′ UTR, and the start of its coding region. Unlike traditional inward facing primers, circular primers will not produce a product unless the transcript is first circularized (Fig. 1A).

Figure 1
figure 1

A subset of Trypanosoma brucei mitochondrial transcripts are circularized in the cell. (A) Schematic of primer placement and PCR products from linear and circularized RNA. CDS, coding sequence. (B) Schematic of the electron transport chain and mitochondrial ribosome with transcripts that were chosen for PCR analysis shown above the complexes of which they are a part (above). Complex II, shown in grey, does not have any subunits encoded in the mitochondrial genome. Length of bar is proportional to relative length of mRNA prior to editing (below). Green, regions that do not require editing prior to translation; maroon, regions that require editing prior to translation. ND1, NADH dehydrogenase subunit 1; CYb, cytochrome b; CO1, cytochrome c oxidase subunit 1; A6, ATP synthase subunit 6; RPS12, ribosomal protein S12. (C) Agarose gel of PCR products using circular oriented primers (top panels in A) for three different T. brucei mitochondrial transcripts from cDNA made from RNA of ‘control’ cells as defined in “Materials and methods”. The smear of DNA is from the variation in tails lengths and is expected for these products. L1, T4 RNA ligase (Epicentre) added; L2, T4 RNA ligase (New England BioLabs) added; NL, no ligase added. Pre, pre-edited; e, edited. (D) Agarose gels of PCR products from RNA treated with ligase (+L) and without ligase (−L) of insect stage (top) and mammalian bloodstream stage (bottom) T. brucei for five mitochondrial transcripts. Transcripts that are edited have different primer sets to amplify their pre-edited (p) or edited (e) sequences. The blank well for +L CYbe in the bottom gel is expected because edited CYb transcripts are not found in bloodstream stage parasites. Transcripts that produced products from −L RNA are bolded. Brackets indicate expected sizes of PCR products. A6, CYb, and RPS12 were run on one gel for insect stage and one gel for bloodstream stage. CO1 and ND1 were run on the same gel for insect and bloodstream stage. Arrows indicate excess primers. Gel images are cropped. Original gels are presented in Fig. S5.

We are currently using circTAIL-seq to investigate tail populations from several mitochondrial-encoded transcripts (identified in Fig. 1B) on newly generated parasite insect life stage cell lines that, when induced, will result in either overexpression and/or RNAi silencing of proteins of interest. Without the inducer tetracycline in the media, the cells are effectively wild type, lacking overexpression or silencing of any genes, thus gene expression mirrors that of the parent cell line 29-13 (referred to as ‘control cells’, and described further in “Materials and methods”). In support of this, circTAIL-seq length and tail composition analysis from three of these cell lines in their uninduced state are practically indistinguishable (Fig. S1). In our current circTAIL-seq and in earlier experiments examining the effect of enzyme manipulation on tails by manually cloning and Sanger sequencing18, the uninduced state of these cells are considered wild-type.

In generating amplicons for circTAIL-seq, we included a control PCR in which no ligase was added to the RNA, and thus should produce no amplicon library. Unexpectedly, we found that for some transcripts, PCR products were formed and visible by agarose gel electrophoresis even without the addition of ligase (Fig. 1C). The amplicon produced differed from those that were produced from in vitro ligated RNA in that products appeared less abundant, and the length of the products was shorter overall. These products are not byproducts of DNA contamination, as we failed to obtain PCR products with the same circular primers when purified DNA was used as a template (using the mitochondrial transcript ND1, Fig. S2A). In contrast, an ND1 product of similar size to that of Fig. 1C was generated when DNase treated RNA was used to generate the cDNA in the upstream step (Fig. S2A). The most likely explanation for the existence of these PCR products generated in the absence of in vitro mRNA circularization is that a fraction of mitochondrial mRNAs is naturally and normally circularized in the parasite mitochondrion.

To determine the extent of such a phenomenon, we tested all our functional circular primer sets on cDNA of RNA from parent 29-13 insect life stage parasites and bloodstream-stage ‘single marker’ (SM) Lister 427 life stage parasites both with and without performing in vitro RNA circularization (Table S1, Fig. 1D). Normally, the circTAIL-seq workflow includes a cycle optimization step for each transcript (example shown in Fig. S2B) in which we identify the number of cycles resulting in products of the expected size range with a minimal number of unusable higher molecular weight concatenated PCR products. However, in vivo circularized transcripts can be more clearly observed by gel electrophoresis with additional PCR cycles, so to best compare these amplicons to in vitro circularized amplicons, we increased cycles for all transcripts. Thus, some lanes in Fig. 1D exhibit these high molecular weight artifacts as well as the expected products in the lower size range. For the insect life stage parasites, 4 of 5 tested transcripts produced products without ligase addition, and the products were shorter and qualitatively appeared less abundant than when ligase was added to the RNA. The result was the same for the bloodstream stage parasites.

Notably, for transcripts requiring editing, reliable products were detected for the pre-edited (p) but not the edited (e) versions of the tested mRNA. This suggests that if edited transcripts are circularized in vivo, their abundance is very low. Further, an amplicon was not generated for never-edited transcript CO1 unless it was first circularized in vitro, indicating that whether a transcript undergoes editing may be related to its likelihood of being circularized. To confirm that some transcripts in the untreated samples did not produce a product with circular primers, we increased the amount of PCR cycles further (Fig. S2C). For insect stage RNA without ligase addition, products were now detected for RPS12e, CYbe, and CO1. CO1 products were also detected in the untreated bloodstream form sample. But, given that higher cycles also produced a band for edited CYb for the ligase-treated bloodstream form RNA, which is known to be practically absent in this life stage of the parasites, the identity of these additional bands is unclear. We cloned and Sanger sequenced circular products from transcripts for NADH dehydrogenase subunit 1 (ND1), pre-edited cytochrome b (CYbp), and pre-edited ATP synthase subunit 6 (A6p) from insect life stage parasites. Sequencing confirmed that these products were indeed circRNA: 3′ ends of mRNAs, sometimes including a short tail sequence, covalently ligated to 5′ ends of the same transcript. We attempted to clone and sequence products from edited CYb (CYbe) that appeared when PCR cycles were increased, but were unsuccessful.

Circularized mRNA populations are distinct from total mRNA

With increased confidence that the amplicons generated were from native mitochondrial circRNA molecules, we further characterized them to distinguish how they may differ from linear mRNAs of the same transcript. We examined ligation point junction sequences between high-throughput sequencing reads from insect-stage parasite libraries obtained with and without in vitro circularization of RNA. We extracted the ligation point junction sequences from Illumina reads of libraries obtained from 29-13 strain parasites untreated RNA (sequences only from circRNAs) and in vitro circularized parasite RNA from control cells described above to identify any termini characteristics unique to circRNA (Table S2). The RNA collected from these two cell lines are directly comparable as the control cells were uninduced and thus behave the same as 29-13 cells. These latter read pools are believed to contain mostly linear mRNA reads due to the high amplicon abundance of these libraries relative to circRNA amplicons. However, as the linear to circularized mRNA ratios in parasites have not been quantified, we refer to these junction sequence populations as ‘total’ rather than ‘linear’. We examined libraries from three transcripts that range in their requirement for RNA editing. They were: never edited ND1; CYb, which is edited in only a small region; and A6, an extensively edited transcript (Fig. 1B). Since we were only able to verify true amplicons via Sanger sequencing using primers specific to the pre-edited forms of edited transcripts, we only examined CYb and A6 transcript reads that had not yet entered the editing pathway (CYbp and A6p).

For each transcript, characteristics of the tail populations between ligation point junction sequences of total and circRNA libraries were compared first. It was clear that tails from circRNAs were shorter and contained far fewer As (Fig. 2). For instance, the highest proportion of A6p tails are between 10 and 15 nucleotides in length when the library derived from total RNA was examined, whereas the A6p derived from naturally circularized mRNAs generally incorporated tails of less than five nucleotides. Tail length trends for CYbp and ND1 also indicate shorter tails in circRNA than total RNA (Fig. 2A). ND1 shows a less dramatic, but similar trend as A6p, with total RNA tail lengths shifting to longer tails when compared to circRNA tails, particularly in a greater proportion of tails that are longer than 20 nucleotides. A bimodal pattern of CYbp tail lengths shows circRNA tails are frequently less than 5 nucleotides (like A6p tails) or about 20 nucleotides, whereas total RNA tails are either less than 15 nucleotides or about 40 nucleotides. Nucleotide compositional differences were also observed. For each transcript’s tail population, Fig. 2B plots show the fraction of nucleotides that are A rather than U at each nucleotide position. For all three transcripts, tails of libraries derived from total mRNA possess regions where A content is as high as 75%. In contrast, tails derived from mRNA in its circular form have a dramatically lower A content, generally below 50% and maximally reaching 60% (Fig. 2B). To ensure that our characterizations of tail populations incorporated into circRNAs were not strain-dependent, we sequenced circRNA tail populations of the same transcripts in the EATRO164 Istar1 (EATRO164) cell line of a different lineage than that of the bloodstream and insect stage cells used thus far. Circularized mRNA tails from EATRO164 parasites exhibited similar trends in length and A composition to the circularized mRNA tails we found in 29-13 parasites, though there are some notable differences (Fig. S3). While CYbp and ND1 show similar patterns in tail length between the circularized mRNA in the two cell lines, A6p has a peak of tail lengths that are under 10 nucleotides in the EATRO164 cells, unlike that of 29-13 cells (Fig. S3A). Similarly, while the pattern of A composition is similar between circularized mRNA tails in 29-13 and EATRO164 cells, A6p tails have a higher proportion of A at the beginning of the tails in EATRO164 cells (Fig. S3B). Additionally, CYbp tails have a higher proportion of A in circularized mRNA in EATRO164 cells than in 29-13 cells. Even though there were differences between EATRO164 and 29-13 circularized mRNA tails, circularized mRNA tails in these two different cell lines were more similar to each other than when circularized mRNA tails were compared to the total mRNA tail population as is shown in Fig. 2.

Figure 2
figure 2

Trypanosoma brucei mitochondrial circularized mRNA (circRNA) possess shorter tails with lower adenine content than tails on the total mitochondrial mRNA of the same transcript. Populations of tails in circRNAs are shown in dark and light blue and tail populations from total mRNA are shown in dark and light orange. (A) Population density curves of tail lengths for three mitochondrial transcripts show the proportion of tails of each length. Y-axes are different because the area under the curves, included the first two nucleotides that are not shown, is equal to 1. (B) Line graph showing the proportions of nucleotides that are adenine at each position along tails for three mitochondrial transcripts. The thickness of the line represents the proportion of tails that are long enough to contribute to the data at each nucleotide position. As the line becomes thinner, the proportion of tails that are available to contribute to the adenine content data approaches zero because there are very few long tails. Data is shown from the second nucleotide in the tail to account for ambiguity in tail cutoff determination.

Differences in length and composition between total RNA and circRNA tails suggest that the manner in which tails are being added to the mRNAs is distinct. Two enzymes add nucleotides to mRNA 3′ ends: KPAP1 adds As and KRET1 adds Us. To investigate how A and U addition may be different for circRNA and total RNA, we used hidden Markov modelling (HMM)19. To define the number of nucleotide addition ‘states’ required for HMM, the two types of mRNA tails on kinetoplastid mitochondrial mRNAs must be considered. The first is a shorter tail found on all transcripts that consists of infrequent switching between Us and As, which results in a succession of homopolymers of As (state 1) and/or Us (state 2) rarely interrupted by a different nucleotide. For never-edited transcripts such as ND1, these shorter tails are sometimes elongated with a sequence characterized by frequent switching between As and Us presumably involving both KPAP1 and KRET1 combined activities (state 3). Thus, although it is appropriate to model pre-edited transcripts CYbp and A6p with a 2-state model, a 3-state model is required for never-edited ND1 (Fig. 3). For A6p (Fig. 3, top row) while the majority of tails in both the total and circularized RNA initially enter an A-adding state, a higher percentage of circRNA enter the U-adding state (34–35%) than the total RNA (24–26%). Additionally, more circRNA tails move from an A-adding state to the U-adding state (14–15%) than total RNA (3%). This is reflected in the lower percentage of As in the circRNA tails shown in Fig. 2B. Similar trends are shown for CYbp tails (Fig. 3, middle row). For ND1 (Fig. 3, bottom row), while most tails of the total RNA population move into the primarily U-adding state immediately (69–71%), the majority of circRNA tails (82–85%) instead start in an A/U addition state with A/U ratios that are atypical of this state (A:U, 7:3 is typical). Though this may first appear to contradict Fig. 2B, there is a higher percentage of Us added in the A/U- adding state of circRNA when compared to total RNA (shown as relative proportions of red (A) and blue (U) in the circle). Further, many tails quickly exit this state of addition and proceed to the U-adding state on circRNA molecules (23–30%).

Figure 3
figure 3

Hidden Markov modelling reveals differences in tail addition enzyme activities between total RNA and circRNA of the same transcript. Percentage of adenine addition (red) and uridine addition (blue) are shown for each state by proportion of the circle colored. The percentage of tails that move from one state to another is shown by thickness of arrows. Actual percentages are written next to arrows of particular interest. Grey, starting state; 1, primarily A-adding state; 2, primarily U-adding state; 3, A/U-adding state.

In addition to confirming data seen in Fig. 2, HMM gives us new insight into how the enzymes may be functioning differently during tail addition. The tails that stay in one state reflect processive enzyme activity, while the tails that move to another state suggest enzymes that are switching more frequently. In all three transcripts tested, the A-adding KPAP1 is less processive in circRNA tails than total RNA tails. This is shown by the decrease in the percentage of tails that stay in the A-adding state when comparing total RNA to circRNA (Fig. 3, 91% in total RNA to 82% in circRNA for A6p, 95–96% to 88–89% for CYbp, 93% to 85–88% for ND1). For the pre-edited transcripts (A6p and CYbp), we also see an increase in the percentage of tails that stay in the U-adding state (76–80% in total RNA to 83–84% in circRNA for A6p, 88% to 93% for CYbp), indicating that the U-adding KRET1 may be more processive. The HMM for ND1 indicates a different enzymatic activity pattern, with the predominant tail percentage starting in an atypical U-rich A/U-adding state, which indicates a less processive KRET1 that is often interrupted by KPAP1.

Considering that tails incorporated into circRNAs are shorter than those of the total mRNA population and that there are differences in tail-addition enzymatic activity between total and circRNAs, circularization may occur during mRNA processing around the step of tail addition initiation. If this is the case, we may expect a higher proportion of ligation point junctions in which the 3′ terminus entirely lacks an untemplated tail sequence in the circRNA libraries than in those of the total RNA-derived libraries. We determined the percentage of reads within ligation point junction regions that lack tail sequences, and found that for transcripts ND1 and A6p, circRNAs indeed had a higher percentage of reads that lacked evidence of 3′ mRNA tail addition prior to circularization (Fig. 4A). Ligation point junction sequences from mitochondrial circRNAs of EATRO164 parasites revealed that similar percentages lacked tails (Fig. S4A). However, in both parasite cell lines CYbp circRNAs were tailed at percentages similar to that of total CYbp mRNAs. In the circRNA-derived libraries, most of the CYbp tails were short (< 25 nucleotides) oligo(U) sequences (Fig. 2). The tendency for CYbp tails to have short strings of U more predominantly at the beginning of tails compared to other transcripts has been noted previously20. This difference in tail composition suggests that CYb mRNA processing may differ from that of other mitochondrial transcripts. The fact that this difference was captured in the circRNA-derived sequence library suggests that differences in processing occur prior to CYb being vulnerable to covalent self-circularization.

Figure 4
figure 4

Trypanosoma brucei circularized mRNAs (circRNAs) have distinctly different 5′ termini and fewer tails than total mRNAs of the same transcript. (A) Percent of sequenced ligation point junction reads that possess no untemplated A or U additions between the ligated ends of the 3′ and 5′ UTRs. (B) 5′ UTR lengths of circRNA (blue) and total mRNA populations (orange) for each transcript. (C) 3′ UTR lengths of circRNA (blue) and total mRNA populations (orange) for each transcript. For (B) and (C), populations of 3′–5′ junctions that contain nontemplated tail sequence are analyzed separately from populations that do not. 5′ UTR lengths are counted starting with the first nucleotide after the tail ends until the first nucleotide of the start codon. 3′ UTR lengths are counted so that the last nucleotide after the stop codon is position +1. Replicate 1, solid line; replicate 2, dotted line.

Since tail addition on mitochondrial mRNAs that eventually became circularized was curtailed and altered, we considered that mRNAs vulnerable to circularization may display other types of altered maturation. 5′ and 3′ cleavage or trimming are also mitochondrial mRNA maturation steps, so we investigated the UTR lengths of circRNAs and total mRNAs. Using our high-throughput sequencing reads of ligation point junction regions, we determined the 5′ and 3′ UTR lengths between a transcript’s circularized mRNA and total mRNA population. As tail addition may be linked to proper UTR processing, we separately addressed the UTR length comparison of mRNAs that possess tail sequence and those that do not (Fig. 4B and C). We found 5′ UTR lengths were shorter in a transcript’s circRNA population compared its total RNA population when there was no tail present (Fig. 4B, top panel), and this difference increased when there was a tail present (Fig. 4B, bottom panel). However, we surprisingly found that the 3′ UTR length was consistent between a transcript’s circularized and total mRNA populations regardless of whether a tail was present (Fig. 4C). UTRs of circRNAs from the EATRO164 cell line were comparable in length to those of strain 29-13 (Fig. S4B and C), with the exception of some differences in 3′ UTRs of A6p that may be related to sequence differences between the strains.

Considering our discovery that circRNA ligated termini have characteristics that differ from termini of linear mRNAs, it was important to verify the robustness of our sequence dataset from those libraries. We recognized that high-throughput approaches utilizing PCR for library generation can suffer from library low diversity, resulting in less information content in the sequencing output than read numbers would suggest. We previously demonstrated that libraries of traditional circTAIL-seq mitochondrial transcripts employing in vitro circularization of linear mitochondrial transcripts from T. brucei were sufficiently diverse17. Knowing that starting template numbers of circularized molecules used to generate circRNA libraries are likely low, we examined our circRNA library read pools for sequence diversity. First, we looked at the incidence of reads that contained identical junction regions (which include all sequence between the primer annealing positions) (Table S2). The sequences appearing the most in each library that aligned with the DNA template were always 3′-5′ termini direct ligations lacking tail sequence or 5′ and 3′ ends interspersed with a short mono-nucleotide tail sequence. This is not surprising because our sequencing captures a snapshot of the processing state of the population of a transcript. All molecules on their way to maturity will have to pass through a state of no tail or short oligo tail. Further, in reads lacking tails or possessing only short oligonucleotides, the lengths of 3′ and 5′ UTRs did vary within the range shown in Fig. 4, indicating that unique molecules were being represented in the read population. We did not see highly duplicated reads with tails that had longer, unique A/U patterns. The repeated appearance of such a sequence among the reads would suggest a single template molecule amplified repeatedly. Additionally, we identified the quartiles for each library for all reads (Table S2) and found that while there are highly repeated sequences as identified above, most junction regions were duplicated only 2–9 times. We conclude that while the total number of sequences analyzed in our circRNA libraries were typically lower than those of the total RNA libraries (Table S2), these populations appear to be sufficiently diverse for the analyses that we performed.

In summary, when compared to a transcript’s total mRNA population, its circRNA population exhibits shortened 5′ UTRs and altered non-templated tail characteristics, but consistent 3′ UTR termini. Together, these data support an mRNA processing model in which (1) modification of the 3′ and 5′ ends of mRNA are physically linked, and (2) circularization happens after the MPsome has trimmed the 3′ UTR and added an initial oligo(U) tail, but before the oligo(U) tails acquire A-rich extensions.

Discussion

We have described, for the first time, kinetoplastid mitochondrial-encoded circular mRNAs. These circRNAs comprise part of the total mRNA population of at least 4 of the 5 tested transcripts in both insect- and bloodstream-stage parasites. However, for transcripts requiring editing, only the pre-edited form showed convincing evidence of circularization (Fig. 1D). After high-throughput sequencing three transcripts in two different strains, we confirmed that mitochondrial circRNAs in the two strains have similar characteristics (Figs. S3 and S4). We determined that these circRNAs are a subpopulation of mitochondrial mRNAs with differing 3′ non-encoded tail and 5′ UTR characteristics relative to their linear counterparts (Figs. 2 and 4). Enzymatic activity on circRNAs during tail addition is different, in that it is more limited than the range of tail addition activities acting on total RNA (Fig. 3). Such population-level differences suggest that the sequenced ligation point junctions that we have sequenced are not an artifact of our circTAIL-seq amplification and sequencing protocols but rather come from molecules naturally existing in T. brucei mitochondria. The presence of circRNAs suggest a previously unidentified pathway in the T. brucei mitochondrial mRNA lifecycle.

Since a primary mechanism of trypanosome mitochondrial mRNA processing has been proposed and proteins involved identified, we can derive models for how and when circRNAs are formed. Figure 5 is one model. Processing complexes on mRNA 5′ and 3′ ends are thought to bring the mRNA termini into close proximity through protein–protein interactions13. 5′ and 3′ end proximity is theoretically advantageous for subsequent mRNA covalent circularization.

Figure 5
figure 5

Model of potential mRNA maturation pathways for translatable transcripts (left) and circularized transcripts (right). PPsome, 5′ pyrophosphohydrolase complex; MPsome, mitochondrial 3′ processome; RESC, RNA editing substrate-binding complex; KPAC, kinetoplast polyadenylation complex; RECC, RNA editing catalytic complex.

We suggest that during processing, mRNA ends are brought together earlier than what is currently believed. Interactions between proteins bound to termini are postulated to occur following short A-rich tail addition13. However, while sequenced circRNAs have similar 3′ UTR lengths as those of the total transcript population, they exhibit fewer non-encoded 3′ nucleotide additions (Fig. 4). Additionally, when tailed, they possess tails of high U content rather than typical linear mRNA A-rich tails (Fig. 2). Thus, protein interactions between the 3′ and 5′ bound complexes may bring termini together before global KPAP1 A-tailing of mRNAs but after MPsome trimming and U-tailing. A-rich tails may not be necessary to promote a protective mRNA configuration, as previously understood.

Factors driving processing toward covalent circularization as opposed to the normal steps of RNA editing and translation could include the shortened or altered length of 5′ UTRs of circRNAs that we observed. This 5′ UTR variability could arise from exoribonuclease or endoribonuclease activity, or alternative transcription start sites. Although no T. brucei mitochondrial 5′ to 3′ exoribonuclease has been experimentally confirmed, two mitochondrial exoribonucleases with known 3′ to 5′ activity are also predicted to have 5′ to 3′ activity21,22. Under a polycistronic transcription model, endonucleolytic cleavages separating mRNAs would parallel the exon splicing that precedes non-mitochondrial circRNA formation in other eukaryotes6. Conversely, monocistronic transcription posits 5′ transcription start sites for each mitochondrial gene, thus the different 5′ ends in circRNA could arise from transcription start site alterations14. Regardless of their origins, 5′ UTR length differences could influence the composition of 5′ UTR-bound proteins (Fig. 5). Altered 5′ UTR protein composition could delay 3′ A-rich tail addition, leaving the mRNA vulnerable to covalent circularization by ligases.

Two ligases function in editing, and editing machinery interacts with 5′ and 3′ complexes of all transcripts, not just those that are edited14. Thus, the catalytic RECC that contains these ligases may be positioned such that it can circularize transcripts with altered 5′ UTRs. Further evidence that mRNA circularization may be carried out by editing ligases is that the pre-edited transcripts CYb and A6, which would need to interact substantially with editing complexes to initiate editing, seem to have more circRNAs than the never edited ND1. We believe this to be the case because the percentage of reads aligning to the analyzed transcript is lower in the ND1 circRNA libraries (Table S2) than in the other libraries. Although beyond the scope of this study, quantitatively testing this hypothesis using RT-qPCR would be ideal and designing these experiments is planned for future work.

Even though mitochondrial mRNA circularization likely involves factors and enzymes normally thought to execute other functions, it does not mean that it is unregulated or that it does not have a function. Possible functional roles for mitochondrial circRNAs include a role in mRNA degradation. Atypical 5′ UTR lengths could indicate a compromised mRNA, and circularization may mark such mRNA for degradation. Various methods for circRNA degradation have been proposed23, though some circRNAs are notably more stable than their linear counterparts24,25. If T. brucei mitochondrial circRNAs are stable rather than unstable, they could serve as regulators. In other systems circRNAs regulate various processes by acting as microRNA sponges26, affecting mitochondrial protein import11, and regulating translation or transcription of certain proteins27. Though recently reported circularized mitochondrial mRNA have been shown to be translated in plants10, in T. brucei this is unlikely because we primarily observe circRNAs formed from pre-edited transcripts that do not have open reading frames. Additionally, since circularization seems to halt proper mRNA tailing and is associated with shorter and more variable 5′ UTRs, this suggests circRNAs are not later translated, but are rather directed into a new RNA pathway entirely.

In this study, we show that a subset of T. brucei mitochondrial mRNAs are circularized early in the mRNA maturation lifecycle. The mechanism of circularization and potential role of circRNA in the mitochondrial mRNA lifecycle is intriguing and furthers our understanding of the temporal steps of trypanosome mRNA maturation.

Materials and methods

Parasite cell culture and cell lines

Trypanosoma brucei 29-13 cells (Lister 427 strain expressing T7 polymerase and tetracycline repressor), and EATRO164 Istar1 (EATRO164) insect-stage cells were grown in SDM-79 at 27 °C in 5% CO2. G418 and hygromycin were added to 29-13 cultures. Bloodstream-stage ‘single marker’ (SM) Lister 427 T. brucei was grown at 37 °C in 5% CO2 in G418-supplemented HMI-9. All cell lines were a gift from Dr. Laurie Read, University at Buffalo in 2013. Most experiments of mitochondrial circRNAs, including all deep sequencing of circRNAs, were performed on 29-13, EATRO164, and SM parasite RNA. However, mitochondrial circRNAs were first identified with gel electrophoresis and Sanger sequencing on a 29-13 strain with an additional nonphenotypic genetic modification. Currently unpublished and termed “control”, this cell line under tetracycline induction will express the protein KPAP1 from an exogenous locus and simultaneously silence endogenous KPAP1. Lacking tetracycline (uninduced), “control” cells are phenotypically identical to 29-13 in growth and other parameters. CircTAIL-seq (including an in vitro ligation step) of total transcript mRNA (likely majority non-circular) was performed on this cell line. These same data from “control” serve as wild-type in a different study to be published elsewhere. “Control” cell line was maintained with G418, hygromycin, puromycin, and phleomycin.

RNA manipulations and PCR product gel electrophoresis analysis

RNA was collected at approximately the same cell densities for each cell line as described in28, circularized using T4 RNA ligase 1 (Epicentre or New England BioLabs) and reverse transcribed using Superscript III (Invitrogen) and gene specific primers as in17. Table S1 lists reverse transcription primers. Water was substituted for ligase in ‘No ligase’ reactions. For PCR utilizing circular primers (listed in Table S1), product was generated with KAPA2G Robust polymerase (KAPA biosystems), manufacturer provided buffer per manufacturer’s protocol. Variable amounts of cycles and cDNA were used because the concentration of cDNA varied greatly depending on the preceding manipulations: a 1:10 dilution of cDNA and 33 cycles was used for Fig. 1C; a 1:4 dilution of cDNA was used in PCRs of ligase and mock ligase treated samples using 30 cycles in Fig. 1D and 35 cycles in Fig. S2C; and undiluted cDNA and 28 cycles was used for Fig. S2A. For PCR conditions, after an initial incubation at 95 °C for 3 min, cycling was performed as stated as follows: 95 °C for 0:15, annealing temperature listed in Table S1 0:15, and extension at 72 °C for 0:30. A final extension step at 72 °C for 1 min was performed. All reactions were electrophoresed on 1.5% ultrapure agarose gels.

Agarose gel imaging equipment and settings

A LiCor Odyssey Fc Imager was used to image agarose gels stained with ethidium bromide using Image Studio Amalysis software, version 5.2.5. Gels were imaged for 30 s on the 600 channel. Contrast and brightness were adjusted for the entire gel until bands were visible while minimizing background color. Original gel images are presented in Fig. S5.

Cloning and Sanger sequencing

50 μl PCR products generated from no-ligase-added cDNA primed with circular primers for ND1, CYbp, and A6p (Table S1) were run on agarose gels and purified using Freeze ‘N Squeeze DNA gel extraction spin columns (Bio-Rad). For manual sequencing, purified PCR products were cloned using NEB PCR Cloning Kit (New England BioLabs). Clones were Sanger sequenced at the University of Minnesota Genomics Center.

CircTAIL-seq

CircTAIL-seq was performed as described in17 with modifications to PCR cycling conditions, sequencing read lengths obtained, and parts of downstream read processing. Reverse transcription primers, Illumina primers, and annealing temperatures are listed in Table S1. For cDNA generated from RNA treated with ligase, a 1:10 dilution of cDNA was used in PCRs, as per the circTAIL-seq protocol in17, and PCR cycling conditions were specific for each primer set (Table S1) to decrease the amount of concatenated high molecular weight products as is demonstrated in Fig. S2B for A6p. For cDNA generated from RNA not treated with ligase, cDNA was not diluted prior to PCR, and PCRs were cycled 34 times to increase the abundance of the products to be sequenced. The University of Minnesota Genomics Center (UMGC) performed quality control on all libraries by assessing quality and quantity on an Agilent BioAnalyzer and performing KapaQC. Amplicon libraries were then sequenced at UMGC on an Illumina MiSeq analyzer using the MiSeq V2 Chemistry 150 PE kit, acquiring 150 bp paired-end reads. One exception, CYbp control replicate 1, was sequenced using MiSeq V2 Chemistry 250 PE kit. Eight libraries were run in one lane and sorted using barcodes when possible and unique gene primer sequences otherwise. Table S2 lists the numbers of raw reads for each sequence file and, following processing, the number of merged reads utilized for downstream analysis of tails and UTR lengths. Though between 96 and 99% of reads were maintained after merging, removing reads that did not align to UTR templates and those that were partially edited resulted in 72–91% of reads used in downstream analysis for most libraries. The exceptions were the ND1 libraries, in which only 51–57% of reads were used for the control libraries and between 8 and 19% were used in the circularized libraries. Initial downstream read processing was performed as in17. Briefly, read quality was assessed with FastQC, paired reads were merged using PEAR, Illumina barcodes were trimmed with Trimmomatic, reads were reverse complemented and converted to fasta files using the FastX toolkit, and merged read quality was checked using FastQC. An updated tail identification protocol was used to identify which nucleotides constituted tail sequence for each read and which were part of the 3′ and 5′ UTRs surrounding any tail. The Needleman Wunch algorithm29, coded in C, was modified for the alignments, while all downstream analysis was done by Python scripts (all code available at https://github.com/Pooryamb/circTAIL). Reads within files were aligned to the DNA sequence corresponding to both the 3′ sequence immediately downstream of the primer binding site to identify the last 3′ templated nucleotide for each read and to the 5′ region just upstream of the 5′ primer binding site to identify the 5′ last untemplated nucleotide. These alignments returned files of 3′ ends, 5′ ends, and non-encoded tails. Because the aligner predicts the longest possible UTR, tails were occasionally mis-aligned to A/U rich UTRs. To address this, we implemented a pruning step in circTAIL-seq alignment. Pruning starts from the end of the alignment and, based on a transcript-specific scoring matrix, terminates the 3′ and 5′ UTR alignment at the point where an A/U rich region follows a gap or mismatch that cannot be attributed to RNA insertion/deletion editing. For EATRO164 A6p, the DNA template used for alignment was modified to accommodate differences from the 29-13 A6p DNA sequence in the 5′ terminus. All graphs were made using R, version 4.2.1.

Hidden Markov modelling

Hidden Markov modelling (HMM) was performed as described in30 and19.