Expression profiling and in situ screening of circular RNAs in human tissues

Circular RNAs (circRNAs) were recently discovered as a class of widely expressed noncoding RNA and have been implicated in regulation of gene expression. However, the function of the majority of circRNAs remains unknown. Studies of circRNAs have been hampered by a lack of essential approaches for detection, quantification and visualization. We therefore developed a target-enrichment sequencing method suitable for screening of circRNAs and their linear counterparts in large number of samples. We also applied padlock probes and in situ sequencing to visualize and determine circRNA localization in human brain tissue at subcellular levels. We measured circRNA abundance across different human samples and tissues. Our results highlight the potential of this RNA class to act as a specific diagnostic marker in blood and serum, by detection of circRNAs from genes exclusively expressed in the brain. The powerful and scalable tools we present will enable studies of circRNA function and facilitate screening of circRNA as diagnostic biomarkers.

tissues [27][28][29] . CircRNAs are highly stable as compared to linear RNA transcripts due to their closed structure that prevents their cleavage and degradation by exonucleases, and can be detected in saliva, exosomes and blood [30][31][32][33] . They are therefore considered as a promising new class of biomarkers 31,33,34 .
Although thousands of circRNAs have been identified using total RNA-seq data, the detection and characterization of circRNAs remain challenging. CircRNAs represent a small fraction of the total RNA pool and the majority of circRNAs are expressed at low levels. In addition, circRNAs are made up of the same sequence as their linear counterparts and can only be differentiated by the sequence reads spanning the back-splice junctions. Such back-splice junction reads are rare in RNA-seq data, making up less than 1/1000 reads in a typical sequencing library. A plethora of bioinformatics approaches have been developed to predict circRNAs and measure their expression from RNA-seq data 21,35,36 , yet the efficiency of circRNA calls and reproducibility across different approaches remain relatively low. RNAse R treatment, a process that digests all linear RNAs but preserves circRNAs, followed by PCR or RNA-seq has been instrumental for validating circRNAs identified from total RNA-seq. However, this strategy hampers the ability to quantify the abundance of circRNAs in relation to their cognate linear RNA. Although some studies attempted to define the localization of circRNAs in cells and tissues using available in situ methods such as FISH 12,37 , it is still challenging to define the localization of circRNAs at the subcellular level. This is mainly due to their generally low expression levels compared to mRNA, relatively short sequence and the sequence similarity with their cognate RNA. Therefore, developing tools to study subcellular localization of circRNAs would provide novel insights to their function. For instance, if circRNAs are enriched in a particular subcellular compartment different from their cognate RNA, the spatial discrepancy between the two molecules could indicate independent functions and regulatory pathways.
Although circRNAs are currently under intensive investigation, there is a lack of approaches to study their function, including high-throughput methods to quantify their abundance in relation to the expression of their cognate linear mRNA. Also there is a lack of methods to define the spatio-temporal expression of circRNAs across tissues, within tissues, and at the subcellular level. To address these methodological issues, we developed a robust target-enrichment sequencing approach to simultaneously measure the abundance of circRNA expression and their cognate linear transcripts across many samples. Furthermore, we adapted the padlock probes to visualize circRNA and their correlated linear transcript in situ, and to determine their subcellular localization in cells and tissues.

Results
In this study, we developed an efficient and cost effective sequencing strategy to accurately measure circRNA expression from various input materials in large numbers of samples. This strategy, here called the circRNA AmpliSeq panel, is based on target enrichment of circRNAs and their cognate linear isoforms using the Ion AmpliSeq ™ target-enrichment sequencing panel (Fig. 1A). We also report improvements for the padlock method combined with in situ sequencing to define the spatio-temporal expression of circRNAs at the subcellular level (Fig. 1B).

CircRNA AmpliSeq panel design and validation.
In order to identify targets to include in the AmpliSeq panel enriching for selected circular and linear RNAs, we first performed standard RNA-seq on total RNA extracted from human fetal frontal cortex, fetal liver, fetal heart, and SHSY-5Y cells. To identify circRNAs expressed in these tissues, we then called sequence reads mapping to back-splice junctions from known exon  Figure 1. (A) A schematic illustration for the design of the primers used to enrich for circRNA and their corresponding mRNA in the AmpliSeq panel. Target specific primers for sequencing-library preparation included: 129 circRNAs (primers facing outwards), 127 linear RNAs (primers facing inwards) and 15 negative controls (primer facing outwards in regions where no circRNAs are annotated). CircRNA targets were selected based on human brain RNA-seq data. (B) A schematic illustration for the detection of circRNA using padlock probes and RCA. RNA is first reverse transcribed followed by hybridization of the padlock probes. Upon perfect hybridization to the target sequence, the 5′ and 3′ of the padlock probe come in proximity and circularized by target-dependent ligation. Circularized padlock probes can be amplified by RCA in situ and visualized using fluorescence microscopy or screened using in situ sequencing. junctions using the circRNA_finder pipeline described in 9 . For initial testing, we focused on brain tissues, as circRNAs are highly abundant and diverse in brain compared to other tissues. From the circRNAs identified in the frontal cortex, we selected 129 circRNAs expressed at high levels as well as 127 linear RNA targets from the same genes for the design of the circRNA AmpliSeq panel. Additionally, we included 15 negative control targets, i.e. back-splice junctions with no support from the RNA-seq data and with no circRNA reported in literature (see Supplementary materials for the targets and their primer sequences). A schematic representation of the target enrichment approach for the circRNA and linear RNA is described in Fig. 1A. To test if our targeted approach successfully captures circRNAs, we performed a pilot sequencing experiment using the circRNA panel on total RNA extracted from human adult frontal cortex tissues. We also used the panel to assay technical replicates of total RNA extracted from the human neuroblastoma SHSY-5Y cell line. The sequencing generated approximately 8 million reads per sample, with 97% of the reads mapping to the panel targets. Based on sequencing read counts for all targets in the panel, we found high correlation in circRNA and linear RNA expression between technical replicates (R 2 = 0.963). We also found the circRNA to linear RNA ratio to be highly correlated between these replicates, highlighting the reproducibility of the circRNA AmpliSeq panel ( Fig. 2A and Supplementary Figure 1). To evaluate the efficiency of the circRNA panel as compared with circRNA detection from total RNA-seq data, we compared the number of sequence reads for all individual targets in the circRNA panel to the number of reads mapping to the same targets in the standard total RNA-seq data from the same samples. For this analysis, we aligned the RNA-seq reads to the exact same target sequences as defined by the AmpliSeq panel and thereby obtained an RNA-seq count for each target. Then, we directly compared the resulting RNA-seq counts to the corresponding counts obtained from the circRNA panel (Fig. 2B). Our results

Figure 2. (A)
Correlation of expression levels (raw read counts) obtained from the circRNA AmpliSeq panel, for two replicate RNA samples from the SHSY-5Y cell line. The black dots correspond to linear RNA targets, red dots correspond to circular RNA targets, and yellow dots correspond to negative control targets. (B) To make the RNA sequencing data comparable to results obtained from the circRNA panel, we aligned RNAseq reads to the reference sequences obtained from the AmpliSeq panel design and required at least 50 bases to be mapped over the middle of each amplicon in order to call a match (indicated by the dotted lines). This implies that the RNA-seq read covers at least 25 bp at each side of the back-splice junction for circRNAs, and 50 consecutive bases for linear RNAs. (C) Quantification of circRNA expression levels from total RNA-seq (y-axis) and the AmpliSeq circRNA panel (x-axis) in SHSY-5Y RNA. The axis scales are logarithmic and the value 1 was added to each target, implying that non-expressed have a value of 0 in the plot. Each dot represents a target amplicon in the circRNA panel, either a circular RNA (red) or a linear RNA (black). (D) Validation of circRNA expression in SHSY-5Y cells. The expression level of circRNAs from SLAIN2, RAB6A, SPECC1, ACVR2A, ZKSCAN1, ANKRD12 and BPTF (represented by the green dots highlighted in 2B), and, a Neg control (negative control target from the circRNA panel, as in (A)) were measured using qrtPCR. All expression values were first normalized to the level of B-actin and then circRNA expression was calculated as a fold of the expression levels from the Neg control using the ΔΔCt method. Expression levels represent mean values of the technical replicates and error bars are ± SD. All expression levels are presented as log 10 . show that the target enrichment leads to a read count several orders of magnitude higher in the circRNA panel compared with RNA-seq ( Fig. 2C and Supplementary Figure S2), but the results of the two approaches are still correlated with R 2 values of 0.332 and ranging between 0.114 and 0.495 for the rest of the samples. The targeted enrichment and subsequent higher read coverage provides an increased sensitivity for detecting circRNAs compared to RNA-seq. Since many circRNAs are expressed at low levels, this increased sensitivity provides an advantage for detection and quantification of circRNA, which would otherwise require deep total RNA-sequencing or RNase R treatment for detection. This is relevant when screening for targets of interest in new tissues or samples. It also makes the panel an attractive choice for screening large cohorts after an initial circRNA discovery effort based on total RNA-seq. Our results also indicate that several circRNAs that were not detected or detected at very low levels using standard RNA-seq are readily captured using the panel, and these may explain the relatively low correlation values between the two methods. To validate these findings using an independent method, we used both the circRNAs panel and RNA-seq data from SHSY-5Y cells and selected four circRNA targets (SLAIN2, RAB6A, SPECC1 and ACVR2A) with no or low coverage in our total RNA-seq and in previously published data 11 , but that were detected at high levels with the circRNA panel (Fig. 2C). In addition, we included three targets (ZKSCAN1, ANKRD12 and BPTF), which had coverage in both total RNA-seq and in the circRNA panel. We used qrtPCR to measure their actual expression levels in the original RNA sample, and found the qrtPCR results to be in agreement and correlate better with the circRNA panel data compared to RNA-seq ( Fig. 2D and Supplementary Figure S3).

CircRNA expression in human tissues.
To explore circRNA expression in a large range of human tissues, we applied the circRNA panel to RNA extracted from a set of human adult and fetal tissues including frontal lobe, liver, heart, placenta and blood (n = 12, Supplementary Materials). To obtain an overview of cir-cRNA expression patterns in the different tissues, we performed hierarchical clustering of the expression counts obtained from the circRNA panel for all the samples (Fig. 3A). In agreement with previously published data 8,13 , we found that circRNAs are expressed in all tissues with distinguishable expression patterns among the different tissues. Samples cluster by tissue type, with the exception for liver where adult and fetal samples do not cluster together. We also used the circRNA panel to measure the ratio between circRNA and their cognate linear RNA and found that circRNA/linear RNA ratios vary between samples. For some genes and in some tissue types, circR-NAs were expressed at higher levels than their corresponding linear RNA. In Fig. 3B, we show two genes, SETD3 and RAB6A, where both the absolute expression of circRNA and the ratio between circRNAs and their linear isoforms differ between the tissues. The complete table containing expression counts for linear and circular RNAs in all samples is available as a Supplementary Data file.

CircRNA expression in FFPE and serum samples.
We then tested if we are able to enrich for and detect circRNA from specimens with low RNA quality such as formalin-fixed paraffin embedded (FFPE) tissues or low RNA content such as in serum. Total RNA was first extracted from 6 FFPE tissues and 1 serum sample (Supplementary Data file) and then sequenced using the circRNA panel. In this experiment, we also performed RNA sequencing using the circRNA panel following RNAse R treatment from one of the adult frontal cortex (Brain 5) as a control for the panel specificity. The relative circRNA expression (circRNA reads/(LinOut reads + circRNA reads)) in each sample is shown in Fig. 4A. As expected, the RNAse R treatment resulted in the loss of the sequencing signal over the linear RNA targets without affecting the sequence coverage over the  circRNA targets. This highlights the ability of the panel to enrich for circRNAs with high specificity. The results from FFPE samples showed substantial reduction of the number of circRNAs detected, as well as reduced expression relative to their linear counterparts, when compared with results from all other samples. We speculate that the reduction of the number of circRNAs detected in the FFPE samples is caused by the strong crosslinking and fragmentation caused by the tissue fixation process. More adjustment and optimization for the RNA extraction from FFPE could improve circRNA detection. Strikingly, we find several circRNAs to be highly expressed in the serum sample. Three examples of such circRNAs are shown in Fig. 4B. Expression of DNAJC6 is of particular interest, since only the circRNA isoform is expressed in blood and serum. DNAJC6 encodes the neuronal protein Auxilin, and it is known have a role in clathrin-mediated endocytosis and to be implicated in the pathogenesis of Parkinson disease and intellectual disability 38,39 . Previous data from The Genotype-Tissue Expression (GTEx) project shows that DNAJC6 is exclusively expressed in the brain 40,41 . The presence of circRNA in serum reflects the high stability of circRNA as compared to linear RNA transcripts. DNAJC6 represent an example of circRNAs that could be detected in blood and serum. Including larger number of serum samples and thorough validation experiments could pave the way to unravel the possibility of using circRNA as biomarkers for a particular biological or pathological condition.
CircRNA detection using padlock probes and RCA. In order to validate our findings from the circRNA panel and to evaluate the possibility of visualizing circRNAs in situ, we adapted a protocol for padlock probes in combination with rolling circle amplification (RCA) 42 . Initially, we tested this method in human fixed cells. We performed multiplex detection of three circRNAs (CDRas1, HIPK3 and MAN1A2) using padlock probes targeting sequences surrounding the back-splice junctions followed by RCA (Fig. 1B). These targets were selected due to their high expression as seen in our sequencing data, and all of them were successfully detected in SHSY-5Y cell line (Fig. 5A). Although, the detection efficiency of the padlock in the cell line is less sensitive than circRNA panel, the relative abundance of the RCA signal for the circRNA targets correlated well with the expression pattern obtained from the circRNA panel (Fig. 5B). To test the method's ability for dual detection of circRNAs and their corresponding linear RNA, we designed additional padlock probes for the HIPK3 and MAN1A2 linear RNA transcripts. Both the circRNA and linear RNA were simultaneously detected for both genes (Fig. 5C). In accordance with our sequencing data, the padlock experiment showed higher expression for circRNAs from both genes as compared to their corresponding linear isoforms. Raw files for the original microscope images are presented in Supplementary Figure S4.  Subcellular localization of circRNA in the brain. While it is believed that the majority of circRNAs are cytoplasmic, the detection of circRNAs in the nucleus is also reported 19 . Since current reports suggest that circR-NAs could display functions independent from the function of their host gene, we motivate that it is interesting to investigate if some circRNAs show different localization than their cognate linear RNA molecules. Therefore, we extended the use of the padlock approach and RCA to study the subcellular localization of circRNA in vivo.
To define the cellular localization of circRNAs, we applied an in situ sequencing approach, which in principle allows the detection of hundreds of targets in one experiment, via the application of padlock probes and RCA 43 . We targeted highly expressed circRNAs detected in our brain RNA sequencing data and investigated their subcellular localization in a fresh/frozen brain tissue section. The tissue section used in this experiment is from sample 5 (see Supplementary Materials), which was sequenced using the circRNA panel. In this experiment, we designed padlock probes targeting 20 bp sequences overlapping and surrounding the backsplice junction of 8 circRNA targets (see Supplementary Materials for padlock probes sequences) (Fig. 1A). We also targeted their corresponding mRNA by targeting exons downstream of the backsplice junction. As a control for the cellular localization experiment, we included padlocks targeting the Malat1 lncRNA, which is predominantly expressed in the nucleus 44,45 . Following RCA of the specifically-bound padlock probes, we subjected the RCPs from each target to sequencing-by-ligation 43 . Each individual padlock probe harbored a unique barcode (4 nucleotides) to distinguish different targets 43 (Fig. 6A). We used DAPI to stain cell nuclei, and used these images to segment the data to signals that are located within the DAPI stained nuclear areas ("nuclear"), and signals that localize outside these areas ("cytoplasmic"). Since we are not using confocal microscopy, some true cytoplasmic signals may end up in the areas defined as nuclear with this approach, while true nuclear signals are unlikely to end up in the defined cytoplasmic compartment. We first plotted the number of sequence reads from the in situ sequencing for each target and then assigned them to their subcellular localization. As expected, in situ sequencing results indicated that  Of the eight circRNA targets tested, six were found to be enriched in the cytoplasm (Fig. 6B, and Supplementary Figure S5). The circRNA target SNAP47 showed equal distribution between nuclear and cytosolic compartments, while MGAT5 showed predominantly nuclear localization.

Discussion
Although there is an increasing interest in unraveling the function of circRNAs and understanding their role in human disease, there is a lack of efficient and reliable tools to study their basic characteristics such as their expression levels-especially in relation to their cognate linear mRNA, and their cellular localization. In this study, we developed two approaches to allow an improved understanding of the nature and function of circRNAs. In the first approach, we aimed to develop an efficient and cost effective tool to characterize the expression patterns of circRNAs in different tissues and cells. To achieve this, we designed a pilot target-enrichment sequencing panel using the Ion AmpliSeq TM technology to enrich for circRNAs known to be expressed in human brain and their corresponding mRNA transcripts. Using this panel, we produced an average of 8 million reads/sample which resulted in a coverage over circRNAs several orders of magnitude higher than RNA-seq with 40 million reads/sample. In the second approach, we adapted the padlock technology in combination with RCA and in situ sequencing to detect and identify the spatio-temporal expression of circRNAs at the subcellular level in cells and fresh/frozen tissue sections. The ability to quantify the abundance of circRNAs in various tissues and to identify how their expression is correlated with their corresponding mRNA is crucial to understand the nature and the function of circRNAs. Using the circRNA panel, we successfully enrich for the circRNAs and their corresponding mRNAs from various input materials. We find the circRNA panel is more efficient at detecting circRNAs than total RNA-seq. Using regular coverage total RNA-seq data (up to 100 million reads) the quantification of circRNA together with their linear counterparts is limited to the most abundantly expressed circles. The target enrichment approach leads to significantly higher sequence coverage over circRNA and mRNA targets as compared to total RNA-seq. This enabled the detection of circRNAs with very low (or no) read coverage in RNA-seq data, leading to improved and In addition, circRNA from CDR1as and linear RNA from MALAT1 (known to be exclusively expressed in the nucleus, and used as a control for the localization experiment) were also included. Padlock probe sequences for each of the targets are listed in Supplementary Materials. In situ sequencing was performed to identify the 4-bp specific sequence barcodes in RCA products (RCPs) of each of the targets as shown in 43 . All targets were plotted on the DAPI image of the brain tissue section after in situ sequencing. Localization of each of the detected targets is shown as a symbol, left panel. Each symbol represents a barcode sequence that corresponds to a specific target. The nuclei are shown in grey. Scale barrepresents 100 μm. Zoom in region corresponds to the region in white dash-line square. (B) Abundance and subcellular localization of circRNA and linear RNAs. The ratios of signals in nuclei for all targets were plotted against their barcode counts in the brain sample. Similar to the circRNA panel data, CDR1as was the most abundant circRNAs in the in situ sequencing experiment. Also, in agreement with previous data, MALAT1 was highly expressed in the nucleus in the in situ sequencing experiment. more sensitive estimation of circRNA expression, especially for circRNA expressed at low levels or in specimens that contain a low amount of RNA. Several of these targets were detected in RNAse R treated samples, indicating that lack of detection in total RNA was not due to inability to align the sequence reads (data not shown). The target enrichment allowed us to screen multiple samples (n = 19) using one P1 Ion Proton sequencing chip, producing approximately 80 million reads/P1 chip. On the other hand, the total RNA-seq for the samples in our screening was run with 2 samples/chip, producing approximately 40 million reads/sample, highlighting the cost effectiveness of circRNA panel over total RNA-seq. We do note that two linear targets in the panel yielded low or no signal (Fig. 2C), indicating that primers failed. In light of this, it is also important to note that the cir-cRNA panel designed for this study is a proof-of-concept pilot panel, created only to show the feasibility of high throughput screening. It is possible to scale this up, with proper testing of primers for both circular and linear target and including spike-in targets with known concentration, to quantify thousands of circRNA and linear RNA targets in multiple samples using a single panel.
Using the circRNA panel to analyze circRNA and mRNA expression in various tissues we found, in accordance with previous reports, that circRNAs show distinct expression patterns between different tissues. Moreover, the circRNA to mRNA expression ratio exhibited different patterns among the different samples. Detection of circRNA in FFPE samples was particularly inefficient. Although we could still detect a small number of highly expressed circRNAs in FFPE samples (Supplementary Materials), there was a clear reduction of circRNA as compared to their linear counterparts. These results indicate that circRNAs are not present in extracted total RNA from FFPE, either due to degradation or the general challenges associated with RNA purification from FFPE samples. Hence, it is important to note that standard extraction of RNA from FFPE samples will lead to erroneous measures of circRNA in relation to linear RNA, irrespective of how the circRNA screening is performed.
We found several examples where the circRNA isoform is highly expressed, while the linear mRNA isoform is not expressed or expressed at very low levels. One example is DNAJC6, a neuronal protein where the mRNA, in our data and in previous studies, is expressed exclusively in the brain 40 . Here we show for the first time that its cir-cRNA isoform is expressed at high levels in blood, and also detectable in serum. These results support a possible role of circRNAs as stable tissue-specific biomarkers accessible in serum. The findings also highlight the efficiency and feasibility of our circRNA panel to facilitate the analysis of a large number of such samples to further evaluate and test the hypothesis that circRNA could represent potential biomarkers for diagnosis and treatment.
Prior to this study, defining the cellular localization of circRNA and their cognate linear mRNA in the same experiment using common approaches, such as FISH, was hampered due to the sequence similarity between cir-cRNA and their corresponding mRNA and because circRNAs are typically composed of short sequences. Similar to other non-coding RNAs, defining the subcellular localization of circRNA could provide valuable insights into their function. For instance, cytoplasmic circRNAs are suggested to regulate gene expression through interfering with miRNA pathways. While nuclear circRNAs, which mostly represent intron-containing circRNAs, are suggested to interact with the splicing machinery and promote transcription 19 . In this study, we demonstrate that the padlock approach followed by RCA can be used to visualize circRNA and their corresponding mRNA in cells and tissue sections. The abundance of circRNA obtained from the padlock experiment from the SHSY-5Y cell line correlated well with the results from the circRNA panel. However, it is clear that the method has limited sensitivity as only a few targets per cell are detected. We believe that further optimization is possible to increase the sensitivity for detecting circRNAs expressed at moderate and low levels. When combined with in situ sequencing (sequencing-by-ligation), this approach allowed multiplex detection of circRNA and their corresponding mRNA in human brain at the subcellular level. These results highlight the possibility of using our in situ sequencing strategy to define the subcellular localization of circRNAs and how they co-localize with their corresponding mRNA isoforms. To our knowledge, this strategy is the first of its kind to allow a multiplex detection and subcellular localization data of circRNAs and their corresponding mRNAs in situ at the subcellular level. In addition, it offers an opportunity to resolve the spatio-temporal expression of circRNA in complex tissues such as the brain.
Both the circRNA panel and the circRNA padlock visualization approaches presented here will pave the way for an improved hypothesis-driven targeted analysis of circRNA functions in cells and tissues. The enrichment of the circRNA panel allows a cost-effective approach to perform surveys of circRNA expression profiles in large sample sets. The circRNA panel setup offers the possibility to increase the number of targets and to perform custom design for the circRNAs of interest. On the other hand, the padlock strategy offers, for the first time, an experimental approach to perform multiplex detection of circRNA cellular localization in vivo. This approach can also be used to define the spatial context of circRNA expression profiles between different cell populations within the same tissue. In conclusion, our approaches represent powerful a toolbox for deep profiling and characterization of circRNA and for unlocking their roles as potential diagnostic biomarkers.

Material and Methods
Samples. Informed consent was obtained from subject or their guardians for the collection of blood and tissue samples. Sample collection, methods and experimental protocols were carried out in accordance with local guidelines and regulations. The use of the samples and experimental protocols were approved by the Uppsala Ethical Review Board (dnr 2010/236 for blood samples and 2012/082 for tissue samples).
Sample preparation for total RNA sequencing. Total RNA from the fetal frontal lobe, fetal liver and fetal heart was purchased from Capital Biosciences. Total RNA from fetal tissues and SHSY-5Y cells was extracted using RiboPure TM RNA purification kit (Ambion) according to the manufacturer instructions. for total RNA purification. Total RNA purification for these samples and 6 human blood samples was performed using RiboPure TM RNA purification kit (Ambion) according to the manufacturer instructions. Total RNA from fetal and adult liver, fetal and adult heart, and placenta were purchased from Clonetech. Discovery of circular RNAs in total RNA-seq data. To identify circular RNAs in total RNA data, a computational pipeline adapted from 9 was used. In short, reads were aligned using STAR 48 . Reads were aligned using the following parameters to identify chimeric transcripts: --chimSegmentMin 20 --chimScoreMin1 --alignIntronMax 1000000 --outFilterMismatchNmax 4 --alignTranscriptsPerReadNmax 10000 --outFilterMultimapNmax 2.
The candidate chimeric junction reads were then filtered to only include cases where the read spanned a junction with the splice acceptor on the same chromosome and strand as the splice donor, at most 100,000 bp upstream.
In the subsequent analysis only circular junctions matching GT-AG splice sites we included. All the scripts are available at https://github.com/orzechoj/circRNA_finder.git and further details are provided in 9 .
Design of an AmpliSeq panel for circular RNA. Loci to be included in the circRNA Ampliseq panel were selected using the following criteria: First, circRNAs with high expression in total RNA-seq data from brain were chosen, so that each circRNA originated from a different mRNA transcript (i.e. a backs-plice site between annotated exons). We required the length of each circRNA (assuming introns are spliced out) was at least 300 nt. Controls measuring expression of linear RNA isoforms (LinOut) were placed on the same transcript 300 nt upstream from the circRNAs if possible, otherwise 300 nt downstream. As negative controls, possible back splice sites with no support from the RNA-seq data were sampled, from transcripts expressed in the brain. The only exception to this scheme was CDR1as, which was added manually to the panel and doesn't have a control for linear mRNA, as it does not originate from a known mRNA transcript. Design of the circRNA panel was based on the hg19 genome and transcript models from the UCSC genome browser.
Sequencing using the circRNA panel. RNA was reverse transcribed to cDNA, and the acquired cDNA was amplified using the custom made Ion AmpliSeq ™ circRNA panel. The sequencing libraries were prepared using the Ion AmpliSeq ™ Library Kit 2.0 (Thermo Fisher). Sequencing libraries were purified using the Agencourt AMPure XP reagent (Beckman Coulter) and amplified. The amplicons were quantified using the Fragment Analyzer instrument (Advanced Analytical). Samples were then pooled, followed by emulsion PCR using the Ion Chef System, and sequenced on the Ion Proton System.
Quantification of circRNAs using the circRNA panel. Analysis data from the circRNA panel was performed using the Coverage Analysis plugin in the Torrent Suite Software. In each of the samples, this analysis counts the number of reads originating for the different targets in the panel (Circ, LinOut and Neg). To make the circRNA panel data comparable between samples, a normalization was performed by dividing each read count with the total number of reads generated for the sample.
qrtPCR validation. Starting with 1 μg of total, cytoplasmic, nuclear RNA, cDNA was synthesized using the RevertAid First strand cDNA synthesis kit (Fermentas) with random hexamers according to the manufacturer's recommendations. 1 μl of the resulting cDNA was used for qrtPCR to measure the relative circRNA/linear RNA ratio in each sample. The qrtPCR was performed with Stratagene Mx3000P in 96-well plates. The reactions were carried out with an initial denaturation at 95 °C for 10 min followed by 40   Sequencing-by-ligation. An ethanol wash series were firstly used to remove the mounting medium on tissue sections. As previous study 43 slides were washed with DEPC-PBS-T once and treated with UNG buffer (1 × UNG buffer (Fermentas), 0.2 μg/μl BSA, 0.02 U/μl UNG (Fermentas) for 30 min at 37 °C. Slides were washed twice with DEPC-PBS-T, and then washed three times with 65% formamide for 60 s each. In sequencing-by-ligation chemistry, anchor primers and interrogation probes were hybridized separately before the ligation. Briefly, a hybridization mix containing 500 nM of anchor primers in 2 × SSC and 20% formamide were applied to the sample and incubated at RT for 30 min, and followed by DEPC-PBS-T wash twice. A ligation mix containing each interrogation probe, 100 ng/ml of DAPI, 1 mM ATP (Fermentas), 1 × T4 ligase buffer (Fermentas), and 0.1 U/μl of T4 ligase (Fermentas), was added to the samples and incubated for 30 min at RT. Following 3x DEPC-PBS-T wash, the slides were mounted in SlowFade Gold Antifade Mountant (ThermoFisher).
Image acquisition and analysis. Images were acquired using an AxioplanII epifluorescence microscope (20 × objective). Exposure times for all the experiments are listed in Supplementary Materials. After imaging, the slides were prepared for the next three sequencing cycle by UNG treatment buffer as described above followed by repeating the hybridization, ligation and imaging processes. The image analysis was performed as described in 43 , a stack of images at different focal depths were captured and merged to a single image, and followed by a automatically stitching in the Zeiss AxioVision software. The fully automated sequence decoding was performed similar as previous study 43 . In short, cell nuclei were separated based on shape descriptors. Definition of cell cytoplasm uses nucleus as a seed and depends on sufficient cytoplasmic auto-fluorescence. The image of the general stain was enhanced by a top-hat filter, and RCPs were separated by watershed segmentation. The images were aligned, and fluorescence intensity from each of the signals representing A, C, T and G were extracted. The optimal transformation between a merged image of all signals (A+C+T+G) and the general stain was determined based on intensity 49 . Analysis were performed with CellProfiler (2.1.1, 6c2d896) 50 calling ImageJ plugins from Fiji for image registration. All intensity information was decoded using a script written in Matlab. Briefly, RCP was assigned for the base with the highest intensity for each RCP in all hybridization steps. A quality score was extracted from each base, and the quality of a transcript was defined as the lowest quality of all the bases in the transcript. The quality score ranges from 0.25 (poor quality) to 1 (good quality). The frequency of each sequence was extracted after a typical quality threshold of 0.4-0.55 was set.