Identification of important, functional small RNA (sRNA) species is currently hampered by the lack of reliable and sensitive methods to isolate and characterize them. We have developed a method, termed target-enrichment of sRNAs (TEsR), that enables targeted sequencing of rare sRNAs and diverse precursor and mature forms of sRNAs not detectable by current standard sRNA sequencing methods. It is based on the amplification of full-length sRNA molecules, production of biotinylated RNA probes, hybridization to one or multiple targeted RNAs, removal of nontargeted sRNAs and sequencing. By this approach, target sRNAs can be enriched by a factor of 500–30,000 while maintaining strand specificity. TEsR enriches for sRNAs irrespective of length or different molecular features, such as the presence or absence of a 5′ cap or of secondary structures or abundance levels. Moreover, TEsR allows the detection of the complete sequence (including sequence variants, and 5′ and 3′ ends) of precursors, as well as intermediate and mature forms, in a quantitative manner. A well-trained molecular biologist can complete the TEsR procedure, from RNA extraction to sequencing library preparation, within 4–6 d.
Many native sRNAs are difficult to detect and quantify because of their unique molecular features, including short length, stable secondary structure, low abundance, lack of recognizable features such as poly-A tails, presence of isoforms (e.g., isomiRs) and multiple precursor/intermediate forms1,2,3,4,5. Although most research so far has focused on microRNAs (miRNAs)6, diverse forms of sRNAs are found involved in numerous biological processes, such as transcriptional and post-transcriptional regulation, epigenetic modification of chromatin, control of genome stability and cell-to-cell communication7,8. Functional sRNAs, including miRNAs, are dynamically regulated and can be extremely scarce in different tissues or experimental setups. Examples include circulating miRNAs existing at levels as low as 0.001 molecule per exosome4, functional fragmented tRNAs that are rare in testicular sperm9, and certain PIWI-interacting RNA species (piRNAs) specifically expressed in the brain10. Therefore, sRNA research faces the constant need to address the fundamental but challenging questions of (i) whether or not (or in which conditions) sRNAs exist; (ii) how many sRNA isoforms there are and what are their lengths, sequences and quantities; (iii) what are the precursors and intermediates of a mature sRNA; and (iv) how the expression levels of rare sRNAs change between different biological conditions.
Development of the protocol
We developed a powerful method3, termed TEsR, that enables researchers to answer the above questions by reliably detecting and quantifying multiple forms of sRNAs, including rare sRNAs (Fig. 1). To allow full-length sequencing of mature sRNAs and their precursors, TEsR circumvents two common processes that exclude longer forms of sRNAs, namely size selection (by gel, bead or column) and PCR amplification by random primers. In TEsR, linkers are ligated to each of the two ends of the full-length sRNA molecules, allowing strand information to be maintained and thus enabling the detection of both sense and antisense sRNAs. To minimize undesired ligations, TEsR uses amine-blocked linkers to block 3′ ends and pre-adenylated DNA linkers that show high reactivity with the 3′ end of the RNA. Optimal use of ligases was tested for high RNA–DNA and RNA–RNA ligation specificity and efficiency. Moreover, to also capture the sRNAs that are capped, including many small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs), tRNAs and miRNAs, a 5′-cap-removal step is applied before ligating the 5′-RNA linker to the RNA. Most pri-miRNAs (primary transcripts) are produced by RNA polymerase II and carry a canonical 5′ cap11, and a large number of mature, functional sRNAs retain the 5′-cap structure12,13. Initial stringent denaturing conditions and immediate mixing with an excess of hybridization probes ensure the removal of sRNA secondary structure and prevent their reformation. After linker ligation, TEsR amplifies all ligated sRNAs by PCR, specifically capturing targeted molecules by hybridization while washing away PCR products from abundant and/or irrelevant RNA species.
Importantly, to date, no ad hoc method for targeted sequencing of sRNA is available. To address this demand, we have developed a complete protocol and optimized conditions specifically suitable for capture and deep sequencing of a broad range of sRNA species.
Applications of the method
TEsR can be used to quantify sRNA expression levels, map their transcription start and end sites, characterize their biogenesis by identifying precursors and intermediates, and study their regulation. The start and end sites of an sRNA are important for understanding sRNA biogenesis pathways and sRNA functionality14, for instance, the 5′ cap and 3′ transcription termination sequence are key contributing factors in important processes such as Dicer and Drosha processing, Argonaute strand selection and nuclear–cytoplasmic transport13,15. TEsR is a powerful method to discover or reannotate sRNAs, to confirm their length, characterize their sequences and identify sRNA isoforms. Reannotation is critical, as only 16% of the metazoan miRBase miRNA genes have robust evidence supporting their existence16. Moreover, TEsR is a suitable option for profiling miRNA markers in human pathological samples17 and can be applied in a clinical setting to detect important rare sRNA species, such as those carried by extracellular vesicles in human plasma, urine or saliva samples18. TEsR is capable of monitoring exogenous synthetic sRNAs' quantities, stability and possible changes in sequence lengths, which are critical parameters for the development of sRNAs as therapeutic agents19,20. Notably, most methods and technologies for sRNA sequencing are dependent on manufacturing companies for the design and synthesis of hybridization probes/arrays. Instead, TEsR is fully adaptable in-house, and probes can be designed and synthesized by a researcher to target one or many sRNA(s) of interest. The flexibility in generating hybridization probes and the in-solution capture (independent of arrays with printed probes) make the application of TEsR adoptable in numerous research projects. We have tested TEsR for RNA from human and mouse cell lines, and from synthetic RNAs. The protocol implements decapping, denaturing and adapter ligation steps that are compatible with diverse types of sRNA structures (with or without a 5′ cap, with/without stable secondary structure, short mature sRNA or longer pri/pre-miRNA isoforms), and therefore can be applied to a wide range of purified RNA samples from microbes, plants and animals19. Therefore, TEsR has an unparalleled breadth of applications for deep sequencing of sRNAs. In addition, TEsR is cost-effective, as using a low-throughput sequencer (e.g., Illumina MiSeq) is sufficient to generate ultra-deep sequencing data.
Comparison with other methods
Standard next-generation sequencing (nontargeted, whole-transcriptome sequencing (WTS)) is not efficient in detecting rare sRNAs, as the expression of these low-abundant RNAs is masked by much more dominant classes of miRNA and other sRNAs, such as piRNA, snoRNA and tRNA, which commonly occupy the majority of the sequencing space3,21. Thus, although current sRNA sequencing methods are effective in profiling relatively abundant sRNAs, they often cannot or can only weakly detect rare sRNAs6,22. In these cases, the alternative to choose for increasing the sequencing depth or the sample sizes is often prohibitively expensive and, in general, wasteful.
Methods allowing long RNA target enrichment followed by sequencing (long RNA target enrichment sequencing (TES)), such as the TruSeq targeted RNA expression (Illumina), Ion AmpliSeq (Thermo Fisher) and the CaptureSeq protocol23, have been successfully applied to identify hidden classes of functional long noncoding RNAs24, to discover novel transcript isoforms and splicing branch points, and to correct previous erroneous annotations in mouse and human21,25,26. However, long RNA TES cannot be immediately transferred to capture sRNAs. Current RNA capture sequencing technologies use random primers for cDNA synthesis, which is not a suitable option for sRNA templates because most sRNAs, because of their short length, would remain unprimed or partially primed, resulting in no or partial cDNA extension. In addition, sRNAs are already too short to be fragmented, and bead/gel purification may lose such short RNAs. Importantly, conditions for hybridization of sRNA targets are markedly different from those for long RNA (e.g., hybridization temperature at ∼47 °C for long RNA compared with 37 °C for sRNA)27 and may pose specificity problems28.
Compared with WTS approaches, the main advantages of the TEsR method are (i) it generates 500- to 30,000-fold deeper sequencing data; (ii) it allows low-cost sequencing of multiple samples and multiple target RNAs in a single run with a small sequencer such as the Illumina MiSeq or NextSeq 500 (Fig. 2); (iii) it captures RNAs at variable lengths ranging from microRNA size (15–28 nt) to long RNA fragments (up to 1 kb, to be compatible with Illumina sequencing); (iv) it captures mature and precursor forms of sRNAs, and it enables differentiation of sense and antisense sRNAs, and thus production of strand-specific sequencing reads; (v) it works robustly with small amounts of input RNA (we were able to detect targeted, rare RNAs from >10 ng of RNA samples or purified spike-in RNA from as low as ten spike-in molecules per sample)29 (Fig. 3a and Q.N., J.A., and P.C., data not shown); and (vi) it captures RNA species with or without a 5′ 7-methylguanylate cap and RNA with different secondary structures.
Compared with quantitative reverse-transcription PCR (qRT-PCR) approaches, TEsR generates the following unique information: (i) sequences of the sRNA, (ii) quantities of each type of isoform and precursor and (iii) a larger number of sRNA targets and multiplexed samples combined in one reaction. As a part of the TEsR procedure, we also devised a new qRT-PCR method for sRNA detection in order to validate the enrichment extent of the target sRNAs before the sequencing step. We deploy a primer complementary to one of the adapters (either 5′ or 3′ adapters) and another primer specifically binding to the sRNA target (Supplementary Fig. 1). Therefore, this new qRT-PCR method circumvents a key constraint in sequence length when designing two PCR primers to bind to the sRNA templates, which are often shorter than the total length of the two primers.
Level of expertise needed to implement the protocol
The TEsR protocol does not require specialized skills and it is fully customizable to suit a wide range of researchers and projects. The techniques required are standard molecular techniques such as PCR, centrifugation, gel electrophoresis, DNA and RNA quantification, and Bioanalyzer profiling. With the exception of enzymes or synthesized oligos as specified in the protocol, researchers can design experiments and prepare all buffers needed. Researchers start with designing experiments (selecting targets, designing probes, linker sequences and sequencing index primers), then synthesize biotin-labeled RNA probes, and prepare hybridization and wash buffers.
The biggest limitation of TEsR is the requirement for sequence information of the targeted sRNAs for designing the capture probes, which may not be available when studying novel sRNAs. Nevertheless, RNA sequence is always complementary to the original DNA template, and thus can be predicted on the basis of the DNA template sequence. If the RNA exists and is captured by TEsR, the detailed sequences will be validated29. The prediction can be more complicated in the case of splicing events, which are less common for sRNAs, and DNA mutations. Nevertheless, TEsR does not require exact complementary matches between probes and DNA template targets, and thus can potentially detect single-nucleotide polymorphisms in RNA with high confidence due to the high read depth. We have recently validated the power of this DNA-based prediction approach by applying the TEsR method to detect novel functional telomeric DNA damage response RNAs (tDDRNAs)29. In this work, we predicted the tDDRNA sequence on the basis of the telomere DNA sequence, and we could both quantify the expression level and characterize the lengths and isoforms of the tDDRNA.
In this PROCEDURE, nonhybridized RNAs are washed away, whereas targeted RNA molecules are retained and sequenced (Fig. 1). The probe and primer design are flexible and can include single or multiplexed samples, in order to capture one or hundreds of targets. We recommend using the following controls: linker ligation (Steps 48–60), to compare Bioanalyzer profiles of PCR products with or without linker ligation; RNA probe synthesis (Steps 1–28), to compare the digestion of T7 reverse transcription products with or without RNase treatment; hybridization (Steps 73–109), to compare products after the washing step in samples treated with or without hybridization probes; and qPCR of sequencing libraries (Steps 110–114), to use samples treated with or without capturing probes. The sequencing libraries are compatible with single-end and paired-end sequencing. In most cases, the single-end protocol (50-bp read length) is sufficient to sequence the whole mature miRNAs. If the capture targets are long isoforms (e.g., pri/pre-miRNAs) or longer sRNAs (e.g., tRNA), the paired-end protocol is more suitable to enable the assembly and detection of the 3′ termination sequence. Some steps can be cut short by purchasing a large amount of long oligos (>100 bp, with T7 primer already incorporated into the universal primers, which can make Steps 1–28 faster) or by ordering the fully modified oligos, such as the pre-adenylated 3′ ssDNA adapter (to skip the adenylation steps (Steps 29–47) and save ∼8 h). Researchers may choose these options to save time. We describe the fully customizable PROCEDURE for more flexible implementation of the protocol.
Strategy to capture multiple targets and estimate the enrichment level. To estimate the enrichment level, researchers can capture one or multiple endogenous sRNAs and compare sequencing depth of these sRNAs between a standard sRNA protocol and a capture protocol27. We show an example in which we captured three different sRNAs, miR29b1, Snord68 (small nucleolar RNA, C/D box 68) and Snord70 (small nucleolar RNA, C/D box 70), in four independent TEsR experiments (Fig. 2). Four total RNA samples from a mouse embryonic fibroblast cell line were used. One probe was designed for each sRNA. Sequences and locations for probes and targets are shown in Supplementary Tables 1 and 2, and Supplementary Figure 2. For each experiment, simultaneous hybridization of three sRNA targets was achieved by simply pooling different probes at equimolar concentrations in one hybridization tube. Comparisons of results obtained from capturing with those from standard sequencing are discussed in the ANTICIPATED RESULTS.
Design of spike-in RNA and technical replicates to evaluate the quantitativeness and reproducibility. We present the strategy and examples for assessing the linear correlation of capturing RNA from different concentrations and technical replicates. Spike-in RNAs and serial dilutions can be used to estimate the correlation between final read counts and initial concentrations of target RNA molecules. We captured two different RNA targets separately, each of which had three different dilutions (Fig. 3a). The first target was a chemically synthesized spike-in RNA molecule with a sequence distinct from any known endogenous sRNA sequence in human and mouse (Supplementary Table 2). The second target was the human endogenous miR29b-1. Correlation between concentrations and read count per million values for both miR29b-1 and the spike-in target (R2 > 0.99) shows the linear range of the data. Furthermore, the reproducibility of the method can be assessed by capturing and sequencing the same sRNAs (Snord70, Snord68 and miR29b-1) in two technical replicates (Fig. 3b). The two replicates were prepared by splitting the same sample into two identical aliquots. All three captured targets were consistently located on the regression line between the two replicates (Pearson r = 0.9996, R2 > 0.9983), thus proving the quantitative reproducibility of TEsR (Fig. 3b). For normalization, we introduced options to measure both endogenous RNA (e.g., mir29B1) and exogenous spike-in as normalization RNAs29.
Strategy to detect multiple sRNAs and their precursors. TEsR captures a wide range of sRNAs sizes (18–200 nt) and longer sRNA precursors (up to 1 kb, or even longer, if a fragmentation step is used to make sequencing read length compatible with the Illumina platform). This is important because most sRNAs are processed from long precursors, but methods allowing simultaneous sequencing of both short and long forms of sRNAs are currently not available. We tested the ability of TEsR to detect both mature (short) and precursor (long) forms of sRNAs (Fig. 4). To this aim, we applied TEsR using probes for Snord68, Snord70 and miR29b-1 targets, all having different secondary structures and different processing patterns (Fig. 4 and Supplementary Figs. 2 and 3). For Snord68 and Snord70, it is noteworthy that the annotations are inconsistent between databases. Although the Refseq database reports sequence lengths of 48 nt for Snord68 and 53 nt for Snord70 (shown by purple and green lines in Fig. 4a; more details are available in Supplementary Table 1), the Ensembl database contains length annotations of 86 nt and 87 nt, for Snord68 and Snord70, respectively (mouse genome assembly GRCm38/mm10; Supplementary Table 1). TEsR identified, at single-nucleotide resolution, the start and end sites of the extended sequences of Snord68 and Snord70, with the majority of molecules consistent in length and sequence to the Ensembl annotations (shown by the purple and blue lines in Fig. 4a). One prehybridized library can be used for hybridization of multiple probes to targets. Researchers can empirically increase the amount of probes (higher probe-to-target ratio) to increase the efficiency of hybridization. We have applied 30 probes in one capture (Supplementary Fig. 4a,b) and the total number of probes can be increased to thousands of probes, depending on the number and length of the sRNA targets, similar to protocols for capturing long RNAs27 and DNA exome sequences (Illumina).
Design of probes to capture multiple sRNAs and sense/antisense RNAs. A single probe is sufficient for an sRNA target. The optimum probe length is 66 bp, and the selected sequence complementary to a targeted RNA is incorporated between the two universal primers at the two ends (15 bp). If targeting sRNAs (and precursors) longer than 66 bp, multiple probes spanning the whole length of the target can be selected using OligoWiz v2.3.0 (ref. 30). Customized parameters used for OligoWiz probe design include sizes ranging from 60 to 70 bp (with the highest score for probes 66-bp long) and probe position, which was given an equal weight across the whole length of the target template. Probes are scored using four criteria: (i) potential cross-hybridization to human-coding RNAs, (ii) melting temperature (Tm) range among probes (ΔTm), (iii) folding energy and (iv) low complexity (Supplementary Fig. 4). The initial set of probes selected by OligoWiz are then screened to remove potential cross-reactivity, self-dimerization or stable stem loops by a sliding algorithm applied in the AutoDimer software31.
When designing probes, one should take into consideration that with this protocol, one probe can capture both sense and antisense targets in a single reaction. By using linker sequence information in the sequencing read outputs, the TEsR method allows the detection of sRNA with strict strand specificity, such as, for instance, the human GAPDH locus (Supplementary Fig. 4) or the tDDRNA29.
Design of probe(s) to detect novel RNAs. The hypothetical sequence of a novel RNA can be predicted based on the DNA sequence. Depending on the length of the predicted RNA targets, one (for short target) or multiple tiling probes (for long targets) are used. If the RNA exists, the designed probe will specifically capture the RNA, and the TEsR sequencing result can reveal the detailed sequence of the RNAs and its isoforms. Supplementary Figure 4a,b shows an example of designing multiple tiling probes based on the predicted DNA template of predicted novel RNA targets. For shorter RNAs, we designed one probe, for instance, a short probe with repetitive motifs (TTAGGG)11 detection of tDDRNAs29.
Biotin RNA labeling mix (16-dUTP; Roche, cat. no. 11685597910)
Ambion MAXIscript T7 in vitro transcription kit (Life Technologies, cat. no. AM1312M)
T4 RNA ligase 2 (truncated; NEB, cat. no. M0242L)
T4 RNA ligase 1 (NEB, cat. no. M0204L)
5′ DNA Adenylation Kit (NEB, cat. no. E2610L)
MinElute PCR Purification Kit (Qiagen, cat. no. 28006)
Nuclease-free water (Thermo Fisher Scientific, cat. no. 15230147)
Phusion high-fidelity (HF) DNA polymerase (2 U/μl; Thermo Fisher Scientific, cat. no. F-530S)
Dr GenTLE precipitation carrier (pH 5.2; Takara, cat. no. 9094)
Tobacco acid pyrophosphatase (TAP; Epicentre, cat. no. T81050)
PrimeScript reverse transcriptase (Takara Clontech, cat. no. 2680A)
Bioanalyzer assays (DNA 1000 Kit (Agilent, cat. no. 5067-1504) and DNA High Sensitivity Kit (Agilent, cat. no. 5067-4626)
Ambion DNA-free DNA Removal Kit (Life Technologies, cat. no. AM1906)
Phenol/chloroform/isoamyl alcohol (25:24:1, saturated with 10 mM Tris, pH 8.0, 1 mM EDTA (pH of the phenolic phase ∼6.5–6.9; Sigma-Aldrich, cat. no. P2803))
Phenol/chloroform/isoamyl alcohol (25:24:1, saturated with 10 mM Tris, pH 8.0, 1 mM EDTA (pH of the phenolic phase ∼7.8–8.2 after addition of equilibration buffer; Sigma-Aldrich, cat. no. P2069))
Dynabeads MyOne streptavidin C1 (Life Technologies, cat. no. 650001)
SUPERase-inhibitor (20 U/μl; Life Technologies, cat. no. AM2694)
Human cot-1 DNA (500 μg; Life Technologies, cat. no. 15279-011)
Ultrapure salmon sperm DNA solution (5 × 1 ml; Life Technologies, cat. no. 15632-011)
50× Denhardt's buffer (Sigma-Aldrich, cat. no. D2532)
Ultrapure sodium chloride–sodium phosphate–EDTA (SSPE; 20×; Life Technologies, cat. no. AM9767)
Ultrapure SSC (20×; Life Technologies, cat. no. AM9770)
SDS (10% (wt/vol); Wako, cat. no. 196-08675)
EDTA (0.5 M, pH 8.0; Ambion, cat. no. AM9260G)
Ultrapure Tris-HCl (pH 7.5; Invitrogen, cat. no. 15567-027)
NaCl (5 M; Sigma-Aldrich, cat. no. S5150-1L)
NaOH (Wako, cat. no. 197-02125)
Chloroform (Wako, cat. no. 035-02616)
Ethanol (Wako, cat. no. 057-00456)
Turbo DNA-free Kit (Ambion, cat. no. AM1906)
Exonuclease I (E. coli; 3000 U; NEB, cat. no. M0293S)
ScriptSeq Index PCR primers (Illumina, cat. no. RSBC10948)
SYBR Premix Ex Taq (Takara, cat. no. RR041A)
KAPA Library Quantification Kit, Illumina platform (KAPA Biosystems, cat. no. KK4835)
SeaKem LE agarose (Lonza, cat. no. F5111)
Total RNA (1–5 μg) or enriched sRNA samples (10–100 ng)
Modified 3′ single-stranded DNA linker (modified with 3′ phosphate group and 5′ amine block) and 5′ ssRNA linker (modified with amine block)
Agilent 2100 Bioanalyzer (Agilent, model no. G2939A)
Thermocyclers (Thermo Fisher Scientific, GeneAmp PCR System 9700)
Real-time PCR System StepOnePlus (Applied Biosystems, cat. no. 4376600)
Dynabeads MPC-S magnetic particle concentrator (Life Technologies, cat. no. A13346)
Dynal MPC-96S magnetic particle concentrator (Life Technologies, cat. no. 120.27)
NanoDrop spectrophotometer (Thermo Fisher Scientific, cat. no. S09NND360)
Water purification system (Milli-Q Synthesis; Millipore, model no. ZMQS6VF01)
MiSeq sequencer (Illumina, cat. no. SY-410-1003)
Laboratory centrifuge (BioSan, model no. LMC-3000 or equivalent)
Mini microcentrifuge (Sigma-Aldrich, cat. no. CL S6766-1EA or equivalent)
Pipettes and tips (Eppendorf or equivalent)
Thermal cycler (Bio-Rad, model no. T100 or equivalent)
Vortexer (Scientific Industries VortexGenie 2 or equivalent)
Schott bottles, 100 ml (Sigma-Aldrich, cat. no. Z742278)
Bead wash buffer
To make 100 ml, add 79.7 ml of nuclease-free water, 20 ml of 5 M NaCl, 100 μl of 1 M Tris-HCl (stored at 4 °C) and 200 μl of 0.5 M EDTA to a sterilized 100-ml Schott bottle.
Wash buffer 1
To make 100 ml, add 94 ml of nuclease-free water, 5 ml of 20× SSC and 1 ml of 10% (wt/vol) SDS to a sterilized 100-ml Schott bottle.
Wash buffer 2
To make 100 ml, add 98.5 ml of nuclease-free water, 500 μl of 20× ultrapure SSC and 1 ml of 10% (wt/vol) SDS to a sterilized 100-ml Schott bottle.
Synthesis of biotinylated cRNA oligos from cDNA oligos
Timing: ∼9 h
Assemble and mix the following components on ice in a nuclease-free 200-μl microcentrifuge tube.
Component Volume to add (μl) Final concentration Phusion HF DNA polymerase 0.5 1 U per 50-μl reaction 5× Phusion HF buffer 10 1× dNTP mix (2.5 mM each) 4 200 μM each Universal forward (100 μM) 1 2 μM Universal reverse (100 μM) 1 2 μM Oligo bait (100 μM) 1 2 μM Nuclease-free water Up to 50
Perform PCR on the reactions from Step 1 using the following cycling conditions:
Cycle number Denature Anneal Extend 1 98 °C for 2 min 2–21 98 °C for 30 s 60 °C for 30 s 72 °C for 15 s 22 72 °C for 60 s
Purify the PCR products with a MinElute PCR Purification Kit, following the manufacturer's instructions. Elute at room temperature in 31 μl of nuclease-free water and measure DNA content with a NanoDrop spectrophotometer (expected concentration of at least 50 ng/μl; run additional PCRs if more DNA is needed).
Run the purified PCR products (100–200 ng) on a 1% (wt/vol) agarose gel (at 100 V) to check for the presence of a single band (gel extraction or PCR optimization is needed if more bands are present).
Mix the following components on ice in a nuclease-free 200-μl microcentrifuge tube. These PCR components are the same as in Step 1, except for the forward primer, which is replaced with a T7 promoter sequence primer. Keep the tubes on ice until PCR initiation.
Component Volume to add (μl) Final concentration Phusion HF DNA polymerase 0.5 1 U per 50-μl reaction 5× Phusion HF buffer 10 1× dNTP mix (2.5 mM each) 4 200 μM each T7 forward (100 μM) 1 2 μM Universal reverse (100 μM) 1 2 μM Purified PCR product from Step 3 Variable 100 ng in a 50-μl reaction Nuclease-free water Up to 50
Repeat Steps 2–4.
Mix the following components on ice in a nuclease-free 200-μl microcentrifuge tube.
Component Volume Final concentration 10× Transcription buffer (pre-equilibrate to room temperature) 2 μl 1× SUPERase RNase inhibitor (20 U/μl) 0.25 μl 5 U per 20-μl reaction Biotin RNA labeling mix, (10×) 10 mM each 2 μl 1 mM each ATP, CTP, GTP, 0.65 mM UTP and 0.35 mM Biotin-16-UTP Template DNA oligos produced from PCR (Step 6) From 250 to 500 ng 250–500 ng per 20-μl reaction 10× MAXIscript RNA polymerase mix (15 U/μl T7 RNA polymerase) 2 μl 1.5 U/μl Nuclease-free water To 20 μl
Incubate the tubes containing the transcription mix shown in Step 7 at 37 °C for 1 h.
Add 2 μl of Turbo DNase to 20 μl of reverse transcription mix and incubate at 37 °C for 30 min.
To inactivate the DNase, add 1/5 volume of Turbo-DNA free DNase inactivation reagent.
Mix well by vortexing or pipetting, and incubate at room temperature for 5 min.
Transfer the reaction to 1.5-ml tubes and centrifuge at 13,000g for 90 s at room temperature.
Transfer to new 1.5-ml tubes (expected recovery volume ∼20 μl).
Add 80 μl of nuclease-free water to each tube.
Add 100 μl of phenol/chloroform/isoamyl alcohol (25:24:1; use Sigma-Aldrich cat. no. P2803 reagent with pH = 5.6–6.9) and mix until an emulsion is formed.
Centrifuge at 13,000g for 5 min at room temperature.
Transfer the upper aqueous phase (∼100 μl) to a new 1.5-ml tube and add nuclease-free water to 100 μl.
Add 100 μl of chloroform and centrifuge at 13,000g for 5 min at room temperature.
Transfer the upper phase (∼200 μl) to a new 1.5-ml tube.
Add 1/10 volume of 3 M CH3COONa and 1.5 μl of Dr Gentle reagent.
Add 3 volumes of ice-cold 100% (vol/vol) ethanol, invert 30 times and incubate for 2 h at −80 °C.
Centrifuge at 17,000g for 30 min at 4 °C.
Remove the supernatant without disturbing the pellet and wash the pellet with 80% (vol/vol) ice-cold ethanol.
Centrifuge at 17,000g for 15 min at 4 °C.
Repeat Steps 23 and 24.
Remove the ethanol and air-dry for 15 min. Resuspend the pellet in 30 μl of nuclease-free water.
Use 1 μl to measure the RNA content by NanoDrop spectrophotometer and check product size, intactness and trace of free dNTPs by 1% (wt/vol) agarose gel electrophoresis (100 V). See Supplementary Figure 5a.
Store biotinylated cRNA oligos at −80 °C until use.
Adenylation and ligation of 3′ adapter
Timing: ∼8 h
Assemble and mix the following components in a nuclease-free 200-μl microcentrifuge tube. Set up four reaction mixes to increase the recovery. The 3′ ssDNA oligo adapter was produced by Invitrogen and has a 3′ amino block. In this step, a 5′ phosphate end is added to the adapter. Use the NEB 5′ Adenylation Kit for enzymatic adenylation of single-stranded DNA linkers.
Component Volume (μl) Final concentration 3′ ssDNA adapter (100 μM) 1 5 pmol per μl (100 pmol per 20-μl reaction) 10× 5′ DNA adenylation reaction buffer 2 1× 1 mM ATP (adenosine 5′ triphosphate) 2 0.1 mM Mth (Methanobacterium thermoautotrophicum) RNA ligase (50 pmol per μl) 2 5 pmol per μl Nuclease-free water Up to 20
Incubate the tubes from Step 29 at 65 °C for 2 h.
Inactivate at 85 °C for 5 min and place on ice.
Combine the four reactions into one 1.5-ml tube.
Add an equal volume of phenol/chloroform/isoamyl alcohol (25:24:1, pH = 8.0; Sigma-Aldrich, cat. no. PP2069) and mix until an emulsion forms.
Centrifuge at 12,000g for 5 min at room temperature.
Transfer the aqueous phase (∼80 μl) to a new 1.5-ml tube and add nuclease-free water to bring the volume to 100 μl.
Repeat Steps 33–35 once.
Mix an equal volume of chloroform with the aqueous phase. Mix briefly by inverting the tube ten times and centrifuge at 12,000g for 5 min at room temperature.
Transfer the upper phase (∼80 μl) to a new 1.5-ml tube.
Add 1/10 volume of 3 M CH3COONa at pH 5.5 and 1.5 μl of Dr GenTLE reagent.
Add 3 volumes of 100% (vol/vol) ethanol and mix slowly by inverting 30 times.
Incubate the tubes on ice for 1 h and then at −80 °C for 3 h.
Centrifuge at 17,000g for 45 min at 4 °C.
Remove solution and wash the pellet with 500 μl of 80% (vol/vol) ice-cold ethanol.
Centrifuge at 17,000g for 20 min at 4 °C.
Repeat Steps 43 and 44.
Remove all solutions and air-dry the pellet. Resuspend in 20 μl of nuclease-free water.
Measure 1 μl by NanoDrop ssDNA concentration, run on a 20% (vol/vol) polyacrylamide gel to check for nucleotide incorporation (Supplementary Fig. 5b) and dilute adenylated 3′ adapter to a working concentration of 2 μM.
Prepare the 3′ ligation enzyme mix in PCR tubes, as described in the table below, mix thoroughly by pipetting up and down six to eight times, and spin down briefly (500g, 4 °C, 15 s). Keep the enzyme mix on ice.
Component Volume (μl) Final concentration 10× T4 RNA ligase buffer 1 1× T4 RNA ligase 2 truncated (200U/μl) 1 200 U per 20-μl reaction SUPERase RNase inhibitor (20U/μl) 1 20 U per 20-μl reaction
Prepare the 3′ adapter and RNA mix in PCR tubes, mix thoroughly by pipetting up and down six to eight times, and spin down briefly (500g, 4 °C, 15 s).
Component Volume (μl) Final concentration RNA sample Up to 6 1 μg of total RNA or 100 ng of enriched sRNA Adenylated 3′ linker (2 μM), from Step 47 1 2 μM Spike-in Optional Nuclease-free water Up to 7
Incubate the 3′ adapter and RNA mix from Step 49 at 72 °C for 2 min and immediately after chill on ice for 2 min.
Transfer 3 μl of the 3′ ligation enzyme mix from Step 48 to each of the denatured RNA-3′ adapter tubes on ice and mix by pipetting.
Incubate the reaction mix at 25 °C for 1 h and then keep on ice.
Removal of 5′ guanosine cap of 3′ linker-ligated RNA and ligation of 5′ RNA adapter
Timing: ∼2 h
Prepare the master mix of TAP enzyme.
Component Volume (μl) Final concentration TAP enzyme (5U/μl) 0.1 0.5 U per 10-μl reaction 10× TAP reaction buffer 1.1 1×
Add 1.2 μl of TAP enzyme master mix to each of the 10-μl 3′ linker-ligated RNA tubes from Step 52 and mix by pipetting.
Incubate at 37 °C for 1 h.
Set up the ATP-RNA ligase 1 enzyme mix on ice. Mix by flicking the tube, spin down (500g, 4 °C, 15 s) and keep on ice.
Component Volume (μl) Final concentration 10 mM ATP 1.4 1 mM NEB T4 RNA ligase 1 (20 U/μl) 1 20 U per 14.5-μl reaction
Incubate a 5 μM stock of 5′ phosphorylated RNA adapter at 72 °C for 2 min and immediately chill on ice for at least 2 min.
On ice, add 1 μl of the 5′ phosphorylated RNA adapter to each 2.4-μl ATP-RNA ligase 1 reaction mix from Step 56, resulting in 3.4 μl of 5′ adapter mix. Mix by pipetting and spin down (500g, 4 °C, 15 s).
Add 11.1 μl of 3′ adapter mix from Step 52 to each tube containing 3.4 μl of 5′ adapter mix.
Incubate at 20 °C for 1 h and then keep on ice.
Synthesis of first-strand cDNA
Timing: ∼2 h
Split each adapter-ligated RNA tube from Step 60 into two tubes (7.2 μl each) in order to set up two cDNA synthesis reactions for each sample (to increase per-sample yield). Keep on ice.
Prepare the cDNA synthesis mix.
Component Volume (μl) Final concentration 5× PrimeScript buffer 2.6 1× dNTP, 10 mM each 1.0 0.77 mM SUPERase RNase inhibitor (20 U/μl) 0.2 4 U in a 13-μl reaction PrimeScript RT enzyme (200 U/μl) 1 200 U in a 13-μl reaction
Incubate 1 μl of 20 μM first-strand cDNA primer per reverse transcription reaction at 65 °C for 5 min and immediately chill on ice for 2 min.
On ice, add 1 μl of the heat-denatured primer to each of the 7.2-μl adapter-ligated RNA tubes. Mix by flicking and spin down (500g, 4 °C, 15 s).
Add 4.8 μl of cDNA synthesis enzyme mix from Step 62 to each 8.2 μl of primer-RNA mix. Mix by pipetting.
Incubate at 44 °C for 1 h.
Combine the two cDNA synthesis tubes (split in Step 61) of the same sample in one tube (total volume = 26 μl) and then add 24 μl of nuclease-free water to reach a total volume of 50 μl per tube.
Purify the cDNA using a MinElute PCR Purification Kit, following the manufacturer's instructions. Elute in 20 μl of buffer EB (included in the kit) per column.
Amplification of targeted cDNAs before capture
Timing: ∼2 h
Mix the following components on ice in a nuclease-free 200-μl microcentrifuge tube.
Component Volume (μl) Final concentration 5× Phusion HF buffer 10 1× 2.5 mM dNTPs 4 0.2 mM Phusion HF enzyme (2 U/μl) 0.5 1 U per 50-μl reaction PCR Primer FWD (100 μM) 1 2 μM PCR Primer REV (100 μM) 1 2 μM Template cDNAs from Step 68 20 Half of cDNA Nuclease-free water Up to 50
Perform PCR on the reactions from Step 69 using the following cycling conditions:
Cycle number Denature Anneal Extend 1 98 °C for 2 min 2–23 98 °C for 30 s 55 °C for 30 s 72 °C for 3 min for total RNA capture experiment (72 °C for 30 s for sRNA capture experiment) 24 72 °C for 5 min
Purify the PCR product using a MinElute PCR Purification Kit, following the manufacturer's instructions, and elute in 31 μl of nuclease-free water.
Run 1 μl of the precaptured PCR products from Step 70 on a Bioanalyzer DNA 1000 chip to check for the quality of the libraries and the contribution of linker dimer peaks (Supplementary Fig. 5c).
Preparation of hybridization reactions
Timing: ∼30 min
Thaw the following reagents on ice: 50× Denhardt's buffer, human Cot-1 DNA (1 mg/ml) and salmon sperm DNA (10 mg/ml).
Prepare the hybridization buffer (HB) in a 1.5-ml tube (40 μl per hybridization reaction).
Component Volume (μl) Final concentration (at Step 85) 20× SSPE 20.4 5.5× 50× Denhardt's solution 8.2 5.5× 10% (wt/vol) SDS 0.8 0.1% 0.5 M EDTA 0.8 5 mM Nuclease-free water 9.8
Add the HB in PCR-strip wells (40 μl per well).
Prepare the DNA target strip, by combining the following in a PCR strip.
Component Volume Final concentration Human Cot-1 (1 μg/μl) 2.5 μl 2.5 μg per hybridization reaction Salmon sperm (1 μg/μl) 2.5 μl 2.5 μg per hybridization reaction Block reverse oligo (200 μM) 4.5 μl 4.5 μl per 1 μg of precaptured PCR products Block forward oligo (200 μM) 4.5 μl 4.5 μl per 1 μg of precaptured PCR products Precaptured PCR products 1 μg 1 μg per hybridization reaction Nuclease-free water Up to 20 μl
Mix the reagents of the DNA target strip well with a multichannel pipette and place on ice.
Prepare the bait strip on ice, by adding 18 μl of biotinylated cRNA oligos (from Step 28) and 2 μl of SUPERase RNase inhibitor (20 U/μl) per strip well.
Timing: ∼2 d
Set up three thermocyclers at a constant temperature of 65, 95 and 37 °C each.
Place the HB strip from Step 75 in a 65 °C heat block for 15–20 min.
5 min after placing the HB strip at Step 80, place the DNA target strip in the 95 °C thermocycler for 10 min.
2 min before the end of Step 81, place the bait strip on the 65 °C thermocycler.
1 min before the end of Step 81, with a multichannel pipette, transfer the HB-strip content quickly but carefully to the bait strip (both at 65 °C) and mix gently by pipetting.
When Step 81 has ended, with a multichannel pipette, transfer the DNA target strip content (at 95 °C) to the HB/bait PCR strip at 65 °C, mix well by pipetting gently to minimize foam formation and immediately proceed to Step 85.
Transfer the final hybridization strips from Step 84 to the 37 °C thermocycler and incubate for 48 h.
Streptavidin bead capture of biotinylated cRNA–cDNA hybrid
Timing: ∼4 h
Incubate the bead washed beads from Box 1 at 37 °C for 5 min.
Add 1 ml of wash buffer 2 (WB2) buffer and 1 ml of wash buffer 1 (WB1) buffer separately to 1.5-ml tubes and keep at 37 °C.
Add 100 μl of washed beads to each hybridization strip tube from Step 85 and keep at 37 °C. Mix by pipetting 20 times with a multichannel pipette.
Incubate for 30 min at 37 °C. Mix gently by pipetting every 10 min with a multichannel pipette.
Place strips on a Dynabeads MPC-S plate and let them sit for 3 min at 37 °C.
Remove the supernatant and discard it, without disturbing the beads.
Place the strip back at 37 °C, add 180 μl of WB1 per tube and mix gently by pipetting 20 times with a multichannel pipette.
Incubate at 37 °C in the thermocycler for 15 min (mix gently with a multichannel pipette ten times every 5 min).
Repeat Steps 90 and 91.
Place the strip back at 37 °C, add 180 μl of WB2 and mix gently by pipetting 20 times with a multichannel pipette.
Incubate at 37 °C in the thermocycler for 10 min (mix gently with a multichannel pipette ten times after 5 min).
Place the strip on a Dynabeads MPC-S plate for 3 min at 37 °C.
Remove the supernatant and discard it, without disturbing the beads.
Repeat Steps 95–98 two additional times.
Add 50 μl of freshly prepared 0.1 M NaOH to each strip tube and mix by pipetting to elute the cDNA from the beads.
Incubate for 10 min at room temperature.
Place the strip on a Dynabeads MPC-S plate magnet for 3 min.
Transfer the supernatant to a tube containing 50 μl of 1 M Tris-HCl at pH 8.5 and discard the tubes with the beads.
Place the tubes on a magnetic stand for 5 min to remove residual beads.
Transfer the supernatant to a new 1.5-ml tube.
Purify the captured cDNA with the MinElute PCR Purification Kit, following the manufacturer's instructions, and elute in 21 μl of EB elution buffer.
(Optional) Amplification of post-capture cDNA and second round of streptavidin bead capture of biotinylated cRNA–cDNA hybrid
Timing: ∼2 d
Mix the following components on ice in a nuclease-free 200-μl microcentrifuge tube.
Component Volume (μl) Final concentration 5× Phusion HF buffer 10 1× 2.5 mM dNTPs 4 0.2 mM Phusion HF enzyme (2 U/μl) 0.5 1 U per 50-μl reaction PCR primer forward (100 μM) 1 2 μM PCR primer reverse (100 μM) 1 2 μM Template cDNAs 20 All or half of post-capture cDNA Nuclease-free water 13.5
Perform PCR on the reactions from Step 107 using the following cycling conditions:
Cycle number Denature Anneal Extend 1 98 °C for 2 min 2–11 98 °C for 30 s 55 °C for 30 s 72 °C for 3 min for total RNA capture experiment (72 °C for 30 s for sRNA capture experiment) 12 72 °C for 5 min
Repeat Steps 71–106.
Final library PCR, indexing and library quantification
Timing: ∼5 h
Mix the following components on ice in a nuclease-free 200-μl microcentrifuge tube.
Component Volume (μl) Final concentration 5× Phusion HF buffer 10 1× PCR primer forward (10 μM) 1 0.2 μM ScriptSeq reverse index adapter primer (10 μM) 1 0.2 μM 2.5 mM dNTP 4 0.2 mM Phusion Hot Start DNA polymerase (2 U/μl) 1 2 U per 50-μl reaction Nuclease-free water Up to 50 μl
Perform PCR on the reactions from Step 110 using the following cycling conditions:
Cycle number Denature Anneal Extend 1 98 °C for 2 min 2–16 98 °C for 30 s 55 °C for 30 s 72 °C for 3 min for total RNA capture experiment (72 °C for 30 s for sRNA capture experiment) 17 72 °C for 10 min
For each of the 50-μl PCR reaction tubes, add 0.7 μl of exonuclease I (20 U/μl). Mix by pipetting and incubate at 37 °C for 30 min.
Use AMPure beads for final library purification, following the manufacturer's instructions to remove adapter dimers. The ratio for AMPure purification should be 1.8 (AMPure beads):1(DNA). Elute in 21 μl of nuclease-free water.
Use 1 μl from each tube on an Agilent High Sensitivity DNA chip for quality control to check for sizes, concentrations and adapter dimer contamination (Supplementary Fig. 6). Each sample may be applied in duplicates or triplicates to increase the measurement accuracy.
Use a small aliquot from each tube for library quantification by following the KAPA qPCR instructions (https://www.kapabiosystems.com/document/kapa-library-quantification-illumina-tds/).
Calculate the average concentration obtained in Steps 114 and 115 in order to have a more precise measurement of the actual concentrations to be run on a MiSeq sequencer.
Real-time PCR validation of libraries before sequencing
Timing: ∼2 h
Prepare a final library dilution of 1:500 (the library concentration is often in the range of 10–40 ng/μl) in nuclease-free water (depending on the estimated concentration, the dilution factors can be adjusted to reach an optimal threshold cycle (ct) value).
Mix the following components on ice in a nuclease-free 200-μl microcentrifuge tube.
Component Volume (μl) Final concentration SYBR Premix Ex Taq (Tli RNaseH plus) 2× 10 1× Validation forward primer (10 μM) 0.4 0.2 μM Validation reverse primer (10 μM) 0.4 0.2 μM Final library diluted 1:500 from Step 117 2 Nuclease-free water 7.2
Timing: ∼1 d
Pool into one MiSeq lane all barcoded libraries to a final concentration of 2 nM. The number of samples per lane depends on the sequencing depth required. The protocol can detect rare RNAs with 12 samples per lane for a MiSeq 150 kit. At least 20 million reads will be produced. Most reads will be mapped, and the amount of reads mapped to rRNA should be small (Supplementary Fig. 7).
Troubleshooting advice can be found in Table 2.
Steps 1–28, synthesis of biotinylated cRNA oligos from cDNA oligos: ∼9 h
Steps 29–52, adenylation and ligation of 3′ adapter: ∼8 h
Steps 53–60, removal of 5′ guanosine cap of 3′ linker-ligated RNA and ligation of 5′ RNA adapter: ∼2 h
Steps 61–68, synthesis of first-strand cDNA: ∼2 h
Steps 69–72, amplification of targeted cDNAs before capture: ∼2 h
Steps 73–78, hybridization setup: ∼30 min
Steps 79–85, hybridization: ∼2 d
Steps 86–106, streptavidin bead capture of biotinylated cRNA–cDNA hybrid: ∼4 h
Steps 107–109, (optional) amplification of post-capture cDNA and second round of streptavidin bead capture of biotinylated cRNA–cDNA hybrid: ∼2 d
Steps 110–116, final library PCR, indexing and library quantification: ∼5 h
Steps 117 and 118, real-time PCR confirmation of libraries before sequencing: ∼2 h
Step 119, MiSeq sequencing: ∼1 d
Box 1, streptavidin bead preparation: ∼30 min
Enrichment levels can be estimated by comparing sequencing depth of the TEsR targets with the depth (normalized by library size to counts per million) generated by a standard sequencing protocol, which is a common estimation approach27. The enrichment was from 400 (for abundant sRNAs such as miR29b1, Snord68 and Snord70—Fig. 2a,b) to 30,000 (for rare tDDRNAs; see Supplementary Fig. 8 and a recent study29). We show an example in which the total reads for the three sRNA targets were approximately four times higher than the total of all remaining 5,515 annotated mouse sRNAs, including all miRNAs, small rRNAs, snoRNAs and snRNAs (GENCODE vGRCm38.p3) (Fig. 2a). Remarkably, the reads covering the three sRNA targets constituted 78–83% of the total read counts in the libraries. Conversely, in control standard libraries, these three sRNAs accounted for only ∼0.2% of the total reads (Fig. 2b), thus demonstrating that TEsR allowed for 400-fold enrichment for these three sRNA targets. Moreover, a marked shift in detection ranks of the three captured sRNAs was evident when comparing captured to not-captured libraries (Fig. 2c,d), proving superior performance and cost-effective results were obtained by the TEsR approach, even on a smaller sequencer such as the Illumina MiSeq sequencer.
For capturing different isoforms, we show an example in which TEsR detected variant forms that extend from 3 to 16 nt from both the 5′ and 3′ ends of three sRNAs (more details in Supplementary Table 1). miR29b-1 (with the annotated precursor form at 71 nt long) has two mature cleavage forms: mmu-miR-29b-1-5p (22 nt) and mmu-miR-29b-3p (23 nt). with a stem loop sequence in the middle (Supplementary Table 1 and Supplementary Fig. 3)32. From TEsR deep-sequencing data (∼400 times deeper than a standard sRNA sequencing protocol), we generated coverage plots of mapped reads to the mir29B1 reference sequence. Two independent captured libraries consistently defined the start and end sites of both the miR29b-1-5p (blue) and the miR29b1-3p (green) forms (Fig. 4b). Notably, the method also detected the mir-29b-1 pre-miRNA form that contains the 3′ and 5′ miRNA at two ends linked by the middle loop (Supplementary Fig. 3).
The protocol is based strictly on the target enrichment of sRNA approach used in our original publication for the study of transcription, and therefore its application as specified will generate similar data29. We applied TEsR to detect and quantify a novel, low-abundant and functional telomeric RNA species needed for DNA damage response activation at dysfunctional telomeres, named tDDRNAs29. The sequencing depth generated by TEsR (∼10,000–50,000 reads per sample of the three biological replicates) not only confirmed the existence of tDDRNA but also enabled differential expression analysis of these rare sRNA species, proving that telomere deprotection induces the accumulation of tDDRNAs generated from telomere DNA. Importantly, TEsR distinguished sRNA species generated from the transcription of both telomere strands (i.e., sense and antisense telomeric RNA). TEsR also enabled the characterization of tDDRNA length, confirming that the majority of these molecules are 17- to 33-nt long (Supplementary Fig. 8a). Moreover, TEsR generated sequence information at single-base resolution for all isoforms. Through the use of deep HiSeq sequencing, we demonstrated that from 141.3 million reads of total RNA libraries and 66 million reads of sRNAs libraries, only four reads were mapped to tDDRNA. By contrast, TEsR libraries sequenced on a low-throughput MiSeq yielded 5,300 reads mapped to the target RNA from as low as 3 million reads, representing an ∼30,000-fold enrichment of sequencing depth (Supplementary Fig. 8b).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank all P.C. and F.d.d.F. laboratory members, especially G. Pascarella, K. Hashimoto and A. Bonetti, for insightful discussions. F.d.d.F.'s laboratory is supported by Associazione Italiana per la Ricerca sul Cancro (application 12971), AriSLA (project 'DDRNA and ALS'), Associazione Italiana per la Ricerca sul Cancro (AIRC; application 12971), Worldwide Cancer Research (Association for International Cancer Research (AICR), Research Infrastructure Fund (RIF) 14-1331), Research EPIGEN, Fondazione Cariplo (grants 2014-1215 and 2014-0812), Progetti di Ricerca di Interesse Nazionale (PRIN) 2010–2011, Fondazione Telethon (GGP12059), the Human Frontier Science Program (contract RGP 0014/2012) and a European Research Council advanced grant (322726). P.C.'s lab is supported by a Research Grant from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) to the RIKEN Center for Life Science Technologies. J.A. is supported by Marie Curie Initial Training Networks (FP7 PEOPLE 2012 ITN (CodeAge project no: 316354)). F.R. is supported by Fondazione Italiana per la Ricerca sul Cancro (FIRC, application number 12476).
Integrated supplementary information
Supplementary Figures 1–8 and Supplementary Tables 1 and 2.