Main

Many RNA modifications are enzymatically driven chemical modifications to either the ribose or the nucleobase of nucleotides. Approximately 170 types of RNA modifications are known1 and are essential for various biological processes such as genetic recoding2, pre-messenger RNA (mRNA) splicing3, mRNA exporting4, RNA folding5 and chromatin state regulation6. Accumulating evidence indicates that a large number of RNA modifications are associated with cancers7, neurological disorders8 and other human diseases9, and may thus be treated as either diagnostic markers or therapeutic targets. Recent reports also indicate that RNA modifications are associated with the yield of grains10. However, there is an unmet but urgent need to map diverse RNA modifications accurately, and this is complicated by the similarity in their chemical structures11.

Analysis of RNA modifications can be performed by thin layer chromatography12, high performance liquid chromatography coupled with UV spectrophotometry13 or high performance liquid chromatography coupled to mass spectrometry14. However, they all fail to provide any sequence information. Methods based on next-generation sequencing allow for mapping of transcriptome-wide RNA modifications15, but they rely on either specific antibodies16 or chemical treatments of RNA17. These methods are typically tailored to only one specific modification, and thus only a limited type of modifications can be detected by sequencing. These include pseudouridine (ψ)18, N6-methyladenosine (m6A)19, 5-methylcytidine (m5C)20, N1-methyladenosine (m1A)21, N7-methylguanosine (m7G)22, 5-hydroxymethylcytosine23, N6,2′-O-dimethyladenosine16, N4-acetylcytidine24 and A-to-I editing25. Third-generation sequencing techniques, including methods developed by Pacific Biosciences or Oxford Nanopore Technologies, may overcome these shortcomings26. In Pacific Biosciences sequencing, RNA modifications are identified by the observation of time variation between base incorporations27. On the other hand, nanopore sequencing provided by Oxford Nanopore Technologies reports RNA modifications by identifying variations in the ionic current28 or the event dwell time29. However, the nanopore strand sequencing strategy30 still suffers from a low spatial resolution, which is even worse when the modified nucleotides are close neighbours31.

Sequencing RNA in an exo-sequencing manner, is a different strategy with which exonuclease-decomposed nucleotides can be sequentially read by a nanopore. However, this requires the existence of a high resolution nanopore that can unambiguously recognize all nucleotides and their major modifications. A cyclodextrin embedded α-haemolysin (α-HL)32,33 was previously reported to perform this task, but the results fail to show true discrimination between cytidine diphosphate and uridine diphosphate. Identification of RNA modifications was also not demonstrated32. This low resolution should result from the cylindrical lumen geometry of α-HL34. Instead, Mycobacterium smegmatis porin A (MspA)35, which is a conically shaped pore widely applied in nanopore sequencing36, single molecule chemistry37 and structure profiling of biomacromolecules38, is more advantageous. Phenylboronic acid (PBA) is known to form covalent bonds reversibly with 1,2 or 1,3-diols39. Previously, the introduction of PBA to the nanopore lumen was successfully applied to the detection of various cis-diol-containing analytes such as saccharides40, epinephrine and Remdesivir41. However, a hetero-octameric MspA nanopore containing a single PBA adaptor has not been reported previously and nanopore identification of a large variety of epigenetically modified nucleoside monophosphates (NMPs) has also never been reported.

NMP identification using a PBA-modified MspA

To build a hetero-octameric MspA, two different genes coding for N90C MspA-H6 and M2 MspA-D16H6, respectively, (Supplementary Table 1) were custom-synthesized. Both genes were simultaneously inserted into a pETDuet-1 co-expression vector (Methods). Specifically, the N90C MspA-H6 codes for an MspA monomer, at the pore constriction in which a sole cysteine is placed, whereas the M2 MspA-D16H6 codes for the monomer that does not contain any cysteine. Hetero-octameric MspAs composed of different fractions of both gene expression products were generated by prokaryotic co-expression (Supplementary Fig. 1) and were characterized by gel electrophoresis (Supplementary Figs. 2 and 3). The hetero-octameric MspA consisting of one unit of N90C MspA-H6 and seven units of M2 MspA-D16H6 is the only desired MspA assembly and is referred to as (N90C)1(M2)7 (Fig. 1a). (N90C)1(M2)7 was separated from other MspA hetero-octamers by high resolution gel electrophoresis followed with gel extraction (Methods, Supplementary Figs. 2 and 3). Subsequently, 3-(maleimide) phenylboronic acid (MPBA) was allowed to react with the sole cysteine of (N90C)1(M2)7 (Fig. 1b). A real-time observation of this reaction at the level of a single molecule was carried out by single channel recording in a 1.5 M KCl, 10 mM MOPS, pH 7.0 buffer (Fig. 1c and Methods). With a single (N90C)1(M2)7 inserted in the membrane and a continually applied +200 mV bias, the open pore current of (N90C)1(M2)7 (Io) measures ~620 pA. Upward noises, which result from the cysteine residue at the pore constriction as previously reported42, were also observed. With the addition of MPBA to cis at a final concentration of 1 mM, a single current drop measuring ~100 pA was immediately observed. The previously observed upward noises also disappeared simultaneously, suggesting that the cysteine residue has been occupied and the PBA modification to the pore constriction was successful. For simplicity, this PBA-modified MspA is referred to as MspA-PBA. Under the same conditions, the open pore current of MspA-PBA (Ip) measures ~520 pA (Fig. 1c). MspA-PBA can also be prepared in ensemble by mixing (N90C)1(M2)7 with MPBA (Methods). If not otherwise stated, all subsequent measurements were carried out using ensemble-prepared MspA-PBA (Supplementary Fig. 4). Statistical results of the open pore current of (N90C)1(M2)7 and MspA-PBA are measured at 623 ± 13 (mean ± full width half maximum (FWHM)) pA and 510 ± 14 (mean ± FWHM) pA (Supplementary Fig. 4), consistent with the results previously measured (Fig. 1c). I–V curves of (N90C)1(M2)7 and MspA-PBA acquired with varying concentrations of KCl (0.15–2 M KCl) are presented in Supplementary Fig. 5. According to the slope of the I–V curves, the conductance of MspA-PBA measured with a 1.5 M KCl buffer was derived to be ~2.91 nS.

Fig. 1: Discrimination of canonical NMPs using a PBA-modified MspA.
figure 1

a, The structure of (N90C)1(M2)7. (N90C)1(M2)7 is a hetero-octameric MspA composed of seven units of M2 MspA-D16H6 (grey) and one unit of N90C MspA-H6 (pink). (N90C)1(M2)7 contains a sole cysteine (blue). Square box: the top view of a (N90C)1(M2)7. b, The mechanism of NMP identification. A PBA was introduced to the pore constriction by modifying the sole cysteine thiol with a MPBA via Michael addition. NMPs can reversibly react with PBA, generating stochastic sensing events. c, Single channel observation of MPBA modification. After the addition of MPBA, a current drop of about 100 pA was observed, indicating the success of MPBA modification. With the subsequent addition of AMP, successive binding events immediately appear. To minimize bilayer rupture, the applied bias is switched to +20 mV whenever the Faraday cage is opened. The large noise is introduced during opening of the Faraday cage to perform MPBA or AMP addition. The open pore current (Ip) and the blockage level (Ib) are also marked. d, The NMPs and their corresponding events. Top: the chemical structures of CMP (C), UMP (U), AMP (A) and GMP (G), of which the nucleobases were clearly demonstrated. Bottom: representative sensing events. NMPs were added to cis with a final concentration of 300 μM for each analyte. Ip is marked with a grey dashed line. The blockage levels are marked with colour bands. e, Top: a scatter plot of %Ib versus SD from results acquired with four types of NMPs. Bottom: corresponding event histogram of %Ib. Events were acquired from four individual measurements, in which four types of NMP were separately added to cis with a final concentration of 300 μM. Five-hundred successive events of each NMP were employed to generate the statistics. f, A representative trace when four types of NMPs were simultaneously sensed. NMPs were simultaneously added to cis with a final concentration of 300 μM for each analyte. Events of different NMPs were identified according to the characteristic blockage depth and were marked, respectively, with C, U, A and G.

Source data

NMPs consist of a ribose, a phosphate group and a nucleobase, serving as monomeric units of RNA. Due to the presence of a cis-diol in the ribose, NMPs possess an affinity to PBA43 and may be directly detected by MspA-PBA. To test this, single channel recording was performed using MspA-PBA in a 1.5 M KCl buffer (1.5 M KCl, 10 mM MOPS, pH 7.0) (Methods). A transmembrane potential of +200 mV was continually applied. Four canonical NMPs, adenine mononucleotide (AMP), guanine mononucleotide (GMP), cytosine mononucleotide (CMP) and uracil mononucleotide (UMP) were tested as analytes (Fig. 1d). Successive resistive pulses caused by NMPs were immediately observed (Fig. 1d). However, no events were observed when M2 MspA was tested, confirming that the PBA located at the pore constriction is critical in the generation of NMP sensing events (Supplementary Fig. 6). With MspA-PBA, deoxyribonucleoside monophosphate (dNMP) fails to report any events (Supplementary Fig. 7). This is expected because dNMPs have no cis-diol structure which is necessary for sensing.

To describe NMP sensing events quantitatively, the event dwell time (toff), the interevent interval (ton), the percentage blockage (\(\% I_{\mathrm{b}} = (I_{\mathrm{p}} - I_{\mathrm{b}})/I_{\mathrm{p}}\)) and the noise amplitude (SD) were derived as described in Supplementary Fig. 8. Generally, the histograms of toff and ton show an exponential distribution, and could be fitted to derive the mean time constants τoff or τon, respectively. The histograms of %Ib and SD show a Gaussian distribution, which could be fitted to derive the mean percentage blockage \(\overline {\% I_{\mathrm{b}}}\) and \(\overline {\mathrm{SD}}\), respectively. During NMP sensing, by varying the NMP concentrations in cis, the reciprocal of dwell time (1/τoff) remains constant. The reciprocal of the interevent interval (1/τon), however, linearly correlates with the NMP concentration in cis (Supplementary Tables 25 and Supplementary Figs. 912). The dependence of the applied voltage during NMP sensing was also investigated using AMP as a representative analyte. Generally, when the voltage is upregulated, the 1/τoff decreases and the 1/τon increases (Supplementary Table 6 and Supplementary Fig. 13). This is expected because in a pH 7.0 buffer, the NMP is negatively charged and the electrophoretic force can strongly regulate the binding rate.

The conical lumen of MspA provides an excellent resolution with which to distinguish between analytes with minor structural differences37. Bindings of different NMPs to MspA-PBA result in highly distinguishable event features (Fig. 1d). This difference is more amplified at a higher applied voltage (Supplementary Fig. 14) and all subsequent measurements were carried out at a voltage of +200 mV, if not otherwise stated. In this condition, events generated by different NMPs form highly distinguishable populations in the scatter plot of %Ib versus SD (Fig. 1e). The histograms of %Ib of different NMP events also show fully separated Gaussian distributions (Fig. 1e and Supplementary Fig. 15), in which CMP (\(\overline {\% I_{\mathrm{b}}}\)=7.1 ± 0.2%, N = 3, N represents the number of independent measurements), UMP (\(\overline {\% I_{\mathrm{b}}}\)= 8.64 ± 0.09%, N = 3), AMP (\(\overline {\% I_{\mathrm{b}}}\)=10.89 ± 0.14%, N = 3) and GMP (\(\overline {\% I_{\mathrm{b}}}\)=11.8 ± 0.2%, N = 3) are fully resolved (Supplementary Table 7 and Supplementary Figs. 16 and 17). More details of NMP binding kinetics are also summarized in Supplementary Table 7 and Supplementary Fig. 17. Simultaneous sensing of CMP, UMP, AMP and GMP using MspA-PBA was also performed (Fig. 1f and Supplementary Fig. 18), from which different NMP identities can be directly called based on their distinct blockage characteristics. To the best of our knowledge, nanopore discrimination between canonical NMPs without any overlaps in the event distribution has never been previously reported.

Distinguishing of epigenetic NMPs

According to the literature, ~170 epigenetic NMPs have been previously discovered1. These epigenetic NMPs have extremely minor structural differences and pose a great challenge for direct identification. This challenge may be solved by directly monitoring event features of nanopore readouts when epigenetic NMPs are bound to an MspA-PBA. To test this, the same measurements were carried out by taking monophosphates of m5C, m6A, m7G, m1A, inosine (I), ψ and dihydrouridine (D) as the analyte. Due to a lack of commercially available model compounds, ψ (Supplementary Fig. 19) and D (Supplementary Fig. 20) were custom-synthesized and characterized by WuXi AppTec. These epigenetic NMPs have covered the common types of modification occurring with canonical NMPs such as methylation, deamination, isomerization and reduction. Their nucleobase components are demonstrated in Fig. 2a. As shown in Fig. 2a and Supplementary Fig. 21, events of epigenetic NMPs have significantly different blockage amplitudes. To demonstrate a full comparison between all NMPs being tested to date, the %Ib distribution for each NMP is shown in a violin plot, demonstrating that almost all NMPs are already distinguishable solely by analysis of their %Ib, though the event distributions of UMP and m5C still have some overlaps (Fig. 2b). The big variations in ψ and m7G result from the detection of non-specific events away from the main population of events. They may result from impurities introduced during synthesis of the compound. However, these non-specific events only contribute to 0.9% and 1.7% of all events being detected, respectively (Supplementary Fig. 22). The noise characteristics of NMPs may also be included in event analysis to improve the discrimination performance (Supplementary Table 7 and Supplementary Figs. 23 and 24). By plotting a scatter plot containing %Ib versus SD of NMP sensing events acquired from 11 different analytes, 11 fully resolved event populations were generated, respectively corresponding to each NMP being sensed (Fig. 2c). This confirms that this sensing configuration is compatible with epigenetic NMPs and their events are fully distinguishable. Direct discrimination between these 11 types of NMPs using nanopores has never been reported before, however. The discrimination between epigenetic NMPs and their corresponding canonical counterparts is demonstrated in Supplementary Fig. 25.

Fig. 2: Epigenetic NMPs identified by MspA-PBA.
figure 2

a, Top: the epigenetic NMPs investigated in this paper. Seven types of epigenetic NMPs, including monophosphates of m5C, m6A, ψ, D, I, m7G and m1A were investigated. For ease of display, only the nucleobases are shown and all modifications are highlighted in red. Bottom: representative events of corresponding NMPs. From left to right, the representative events were, respectively, from m5C, m6A, ψ, D, I, m7G and m1A. The measurements were carried out as described in Methods in a 1.5 M KCl buffer (1.5 M KCl, 10 mM MOPS, pH 7.0). A transmembrane potential of +200 mV was continually applied. Epigenetic NMPs were added to cis with a final concentration of 300 μM for each analyte. The open pore current (Ip) of MspA-PBA is marked with a dashed line. The blockage levels are marked with colour bands. Noticeable differences in %Ib and SD were observed. b, The violin plot of %Ib of different NMPs. Solely by analysing %Ib, different NMPs are generally distinguishable except for U and m5C. IQR, interquartile range. c, The scatter plot of %Ib versus SD for canonical and epigenetic NMPs. Events from 11 types of NMPs are clearly distinguishable when both %Ib and SD were considered. Events in b and c were acquired from 11 independent measurements, in which 11 types of NMP were separately added to cis as the sole analyte with a final concentration of 300 μM. Five-hundred successive events of each NMP were employed to generate the statistics in b and c.

Source data

NMP identification by machine learning

A machine learning algorithm was established to automatically identify NMPs. The overall training process includes dataset input, feature extraction and model building (Fig. 3a and Methods). All events in the dataset have known labels since they were acquired with a sole NMP with a known identity. The %Ib and SD of each event were automatically extracted using MATLAB to form a feature matrix. Main stream models were evaluated and they all demonstrated satisfactory validation accuracies, indicating that the input data are of a high quality. Specifically, the Kernel Naïve Bayes model and linear Support Vector Machine (SVM) model reported the highest accuracy score of 0.996 (Supplementary Table 8). The Linear SVM model was selected based on its better performance with the testing set. The confusion matrix results based on model testing using the Linear SVM model are shown in Fig. 3b, in which most NMP sensing results report either 99% or 100% accuracy. In Fig. 3c, a decision boundary plot generated by the Linear SVM model is also demonstrated.

Fig. 3: Machine learning assisted NMP identification.
figure 3

a, The flow diagram of the training process. Eleven classes of events, including C, U, A, G, m5C, m6A, ψ, D, I, m7G and m1A were applied as the input dataset. Each class is composed of 400 events, randomly selected from a pool of events respectively acquired with each analyte type. The mean and the standard deviation of each event were extracted to form a feature matrix. Results in the matrix were further randomly split into a training subset for model training and a validation subset for model validation, with which a tenfold cross validation was performed. All classifiers in the Classification Learner toolbox of MATLAB, such as Naïve Bayes, SVM, K Nearest Neighbor (KNN) and Bagged trees, were evaluated to screen the best performing model. The SVM and the Naïve Bayes model have demonstrated the highest accuracy score of 0.996. The SVM was selected for all further investigations. b, The confusion matrix of NMP classification generated using the SVM model. One-hundred events from each NMP class were treated as the testing set. The true positive rate (TPR) and the false negative rate (FNR) were also demonstrated on the right. c, The decision boundary produced by the SVM model. Each coloured region represents the area in which a corresponding NMP event is to be predicted. The scatter plot of %Ib versus SD generated using the testing data is superimposed on the decision boundary for a demonstration. d, A representative trace acquired by simultaneous sensing of 11 types of NMP. The measurements were carried out as described in Methods. The chambers were filled with a 1.5 M KCl buffer (1.5 M KCl, 10 mM MOPS, pH 7.0). A transmembrane potential of +200 mV was continually applied. NMPs were simultaneously added to cis with a final concentration of 100 μM for each analyte. Characteristic events from different NMPs were automatically predicted by the trained SVM models and labelled with different colour dots (CMP, red; UMP, blue; AMP, green; GMP, purple; m5C, yellow; m6A, orchid; ψ, orange; I, lime; D, cyan; m7G, teal; m1A, pink).

Source data

The previously trained Linear SVM model was employed to predict events with unknown identities. Modified NMPs were added to the cis side in the order of m5C, m6A, I, m7G, m1A, ψ and D with CMP, UMP, AMP and GMP already placed in cis. With the Linear SVM model, newly added NMPs can be accurately identified (Supplementary Figs. 26 and 27). To evaluate the training efficiency of the model, learning curves were generated respectively with training or validation data (Supplementary Fig. 28), from which it is conclusive that 176 events were required for the model to reach a 0.990 accuracy. To show event identification from a mixture, a representative trace containing events from 11 different NMPs is demonstrated in Fig. 3d and Supplementary Movie 1. Different NMP types can be recognized and the corresponding labels predicted by machine learning are marked above the trace. This efficiently assists automatic nanopore sensing of different NMPs in a real measurement scenario in which different NMPs exist as a mixture.

Sensing of epigenetic NMPs from methylated microRNA

We further sought to demonstrate direct sensing of epigenetic NMPs in RNAs (Fig. 4a). By treatment with S1 nuclease, the RNA is first enzymatically decomposed into NMPs and then sensed by MspA-PBA. The observed nanopore events were identified by the previously trained machine learning model. Two microRNAs including hsa-miR-21 and hsa-miR-17 with known methylated sites44 were applied. Specifically, the hsa-miR-21 contains a m5C at position 9 and the hsa-miR-17 contains a m6C at position 13 (Supplementary Table 9). Without any enzymatic treatment, hsa-miR-21 and hsa-miR-17 were sensed by MspA-PBA. However, only short-residing spiky events with undefined event amplitudes were observed (Supplementary Fig. 29), indicating that this sensing configuration is insensitive to the template RNAs itself. To minimize interferences of glycerol in the stock solution of S1 nuclease (Supplementary Fig. 30), the S1 nuclease was pretreated by ultrafiltrations to remove glycerol (Methods and Supplementary Fig. 31). The pretreated S1 nuclease was then employed to digest the microRNAs at 23 °C for 4 h. From the gel electrophoresis results, both microRNAs were thoroughly decomposed (Methods and Supplementary Fig. 32). The enzymatic treatment product was then subjected to ultrafiltration to remove the S1 nuclease before nanopore measurements (Methods). During nanopore measurement, the hsa-miR-21 digestion product was added to cis with a final concentration of 100 ng μl−1. A representative trace is shown in Fig. 4b, in which many NMP binding events were observed, suggesting that the generated NMPs are detected well by MspA-PBA. The identities of NMPs were called by the algorithm, which are highly discriminable from the demonstrated NMP events (Supplementary Fig. 33).

Fig. 4: Detection of epigenetic modifications from RNA.
figure 4

a, The schematic diagram of NMP identification from RNA using MspA-PBA. S1 nuclease (green), an endonuclease insensitive to epigenetic modifications, was employed to decompose target RNAs into NMPs. The generated NMPs were then characterized using MspA-PBA, enabling profiling of RNA modifications in a quantitative manner. b, The sequence of hsa-miR-21 and the corresponding trace of nanopore sensing of the digested products. Hsa-miR-21 was reported to contain a m5C at position 9. Characteristic events from C, U, A, G and m5C were clearly detected in the trace, which are respectively marked. The blockage level of m5C is marked with a yellow dashed line. Although the blockage level of m5C is close to that of U, the noise of m5C is significantly larger. c, The scatter plot of %Ib versus SD for hsa-miR-21 digestion products. Events demonstrated were acquired from a 60 min continuous recording. The NMP identity is predicted by the SVM model. Five populations, respectively, from events of C, U, A, G and m5C were detected. d, The sequence of hsa-miR-17 and the corresponding trace of nanopore sensing of the digested products. Hsa-miR-21 was reported to contain a m6A at position 13. Characteristic events of C, U, A, G and m6A are clearly detected, and are marked with the corresponding labels. The blockage level of m6A was marked with an orchid dashed line. e, The scatter plot of %Ib versus SD derived from sensing events of hsa-miR-17 digestion products. Events demonstrated were acquired from a 60 min continuous recording. The NMP identity was predicted by the SVM model. Five populations including C, U, A, G and m6A were detected. All measurements were carried out as described in Methods in a 1.5 M KCl buffer (1.5 M KCl, 10 mM MOPS, pH 7.0). A transmembrane potential of +200 mV was continually applied. MicroRNA digestion product was added to cis with a final concentration of 100 ng μl−1.

Source data

According to the results acquired with hsa-miR-21, five types of NMPs were detected, including CMP, UMP, AMP, GMP and m5C (Fig. 4b,c), consistent with the hsa-miR-21 sequence composition (Supplementary Table 9). The abundance of each NMP type in hsa-miR-21 was also evaluated based on the rate of event appearance followed with a calibration (Method and Supplementary Table 10). The relative NMP composition in hsa-miR-21 was estimated to be 2.17 CMP, 6.81 UMP, 6.88 AMP, 4.92 GMP, 1.03 m5C, 0.06 I, 0.01 ψ and 0.10 D (Supplementary Fig. 34), generally consistent with the true values. The misjudgement of I, ψ and D result from the minor distribution overlap between AMP, ψ, GMP and I. However, the proportion of misjudgement is negligible. The feasibility of epigenetic NMP identification from miRNA is thus approved. To test its generality, hsa-miR-17, was also tested identically to that shown for hsa-miR-21. A representative trace containing nanopore sensing events of the digestion products of hsa-miR-17 is demonstrated in Fig. 4d. The scatter plot results demonstrate five dominant populations of NMP events, respectively corresponding to CMP, UMP, AMP, GMP and m6A (Fig. 4e), consistent with the sequence component of hsa-miR-17 (Supplementary Table 9). Quantitative analysis shows that the relative count of m6A site is 1.08, indicating that only one m6A site was present in the hsa-miR-17 (Supplementary Fig. 34), also consistent with expectations.

Detection of epigenetic NMPs from brewer’s yeast tRNAPhe

Transfer RNA (tRNA) is a type of low molecular weight RNA serving to link the mRNA sequence into the amino acid sequence of protein. Mature tRNAs also contain rich chemical modifications. As reported, more than 90 types of modifications have been discovered in tRNA45. It is thus an ideal RNA to evaluate the performance of MspA-PBA in the identification of epigenetic modifications of natural samples. The brewer’s yeast phenylalanine-specific tRNA (yeast tRNAPhe)38 is applied as a model RNA to test its feasibility. As reported, a mature yeast tRNAPhe contains 14 epigenetically modified sites originated from 11 types of modifications including N2-methylguanosine (m2G), dihydrouridine (D), N2,N2-dimethylguanosine (\(m_2^2G\)), 2′-O-methylcytidine (Cm), 2′-O-methylguanosine (Gm), wybutosine (Y), ψ, m5C, m7G, 5-methyluridine (T) and m1A (Fig. 5a)46. When the yeast tRNAPhe is enzymatically decomposed into NMPs, monophosphates of D, ψ, m5C, m7G, m1A, m2G, \(m_2^2G\), T and Y are in principle detectable by MspA-PBA because their cis-diol structures remain unmodified. The event parameters of D, ψ, m5C, m7G and m1A have been previously acquired and used for model training (Figs. 2a and Fig. 3) so that their events are identifiable by the machine learning algorithm. The monophosphates of m2G, \(m_2^2G\), T and Y are in principle detectable by MspA-PBA and new clusters of events are expected to be observed. However, due to a lack of corresponding pure compounds to produce events for training, the corresponding nanopore events are detectable but not identifiable. Cm and Gm, which lack a cis-diol, are in principle undetectable by MspA-PBA.

Fig. 5: Quantitative detection of epigenetic modifications of yeast tRNAPhe.
figure 5

a, The sequence and the modifications of a yeast tRNAPhe. Positions of modifications, including m2G, D, \(m_2^2G\), Cm, Gm, Y, ψ, T, m5C, m7G and m1A, are shaded. These modifications can be divided into three categories. D, ψ, m5C, m7G and m1A (shaded with red circles), which have known event features, are identifiable. m2G, \(m_2^2G\), T and Y (shaded with blue circles) are in principle detectable by MspA-PBA but are not identifiable. Due to a lack of cis-diol, Cm and Gm (shaded with grey circles) are in principle not detectable by MspA-PBA. b, Gel electrophoresis results. Lane 1: Low range ssRNA ladder (New England Biolabs). Lane 2: yeast tRNAPhe; Lane 3: yeast tRNAPhe treated with S1 nuclease. The gel result shows that the yeast tRNAPhe were completely digested by S1 nuclease treatment. Operations of yeast tRNAPhe digestion are detailed in Methods. c, The scatter plot of %Ib versus SD for tRNAPhe digestion products. Events demonstrated were acquired from a 240 min continuous recording. The NMP identity is predicted by the Linear SVM model. Nine main populations corresponding, respectively, to events of C, U, A, G, m5C, ψ, D, m7G and m1A were identified. Four populations of events, which do not belong to any previously identified event type, were also detected by the unsupervised learning method DBSCAN (Supplementary Fig. 37). UT stands for unidentified events. d, Comparison of the yeast tRNAPhe composition between that derived from measurements and the true value. The measured values were determined and calibrated according to Methods. e, Representative traces acquired during nanopore sensing of the tRNAPhe digestion products. Characteristic events of canonical and epigenetic NMPs are identified by machine learning and are marked with the corresponding labels. All measurements were carried out as described in Methods in a 1.5 M KCl buffer (1.5 M KCl, 10 mM MOPS, pH 7.0). A transmembrane potential of +200 mV was continually applied. Yeast tRNAPhe digestion product was added to cis with a final concentration of 100 ng μl−1.

Source data

tRNAPhe was first enzymatically treated with S1 nuclease at 23 °C for 15 h to produce NMPs (Methods). According to the gel electrophoresis result, it was confirmed that the tRNAPhe had been thoroughly decomposed (Fig. 5b). The enzymatic treatment product was then ultra-filtrated to remove the S1 nuclease and used in subsequent nanopore measurements (Methods). Nanopore measurements were carried out with MspA-PBA (Methods). The yeast tRNAPhe digestion product was added to cis with a final concentration of 100 ng μl−1. The acquired raw events were shown in a scatter plot (Supplementary Fig. 35). Glycerol events, which were introduced by the stock solution of the S1 nuclease, were further removed from the dataset by machine learning (Supplementary Fig. 35). To cope with unknown epigenetic modifications in yeast tRNAPhe, we combined supervised and unsupervised learning algorithms to identify the remaining events of NMPs. Here, One-Class SVM was employed to recognize events that do not belong to any previously trained event types. These events are considered as outliers. On the contrary, events that match the previously trained event types are considered as inliers (Supplementary Fig. 36), which are further identified by the trained Linear SVM model. The outlier events were, however, analysed with a density-based spatial clustering of applications with noise model (DBSCAN) to detect events appearing as clusters (Supplementary Fig. 37). The non-clustered events, which randomly distributed in the scatter plot, are considered as background events and are removed from the dataset without further analysis.

The result of the modification profile of yeast tRNAPhe is shown in Fig. 5c. D, ψ, m5C, m7G and m1A were successfully detected, consistent with the previous training results and literature46. Few m6A events were observed, which may be from background events that coincidently share a similar event feature of m6A or other types of RNA mixed in the sample. Four new clusters of events, which demonstrated event features different from all NMP types that were previously applied for training, were also observed. These new clusters of events are likely from the m2G, \(m_2^2G\), T, Y or other unknown modifications in yeast tRNAPhe. Quantitative analysis shows that the relative NMP composition in yeast tRNAPhe is 17.53 GMP, 16.36 AMP, 16.19 CMP, 12.06 UMP, 3.24 ψ, 2.17 D, 1.53 m5C, 0.40 m7G, 0.37 m1A, 0.11 m6A and 0.04 I, generally in accordance with the calculated true values (Fig. 5d)46. A total of three independent trials was also performed (Supplementary Fig. 38) and the same conclusion was drawn, confirming the repeatability of this technique. Representative traces containing events of the yeast tRNAPhe digestion products are also presented in Fig. 5e. With the above results, the capacity of MspA-PBA to measure NMPs and their epigenetic modifications from natural RNAs have been well approved.

Conclusions

A hetero-octameric MspA containing a sole PBA is reported to sense NMPs. Eleven types of NMPs are fully distinguished, outperforming those demonstrated by α-HL32,33 or solid-state nanopore47,48. A machine learning algorithm was built, reporting a 0.996 accuracy. This work reports the largest number of NMP types that can be fully distinguished using nanopore. The only limitation is that the current sensing strategy fails to detect ribose-modified NMPs, such as Cm and Gm (ref. 1). Compared with mass spectrometry, our method offers a higher resolution, especially in distinguishing RNA positional isomers (Supplementary Fig. 39). It is thus more suitable for RNA modification detections from mixed and native samples, without a need to couple with any chromatographic separation technology and complex data interpretation. This sensing strategy was also applied to identification of enzymatically cleaved NMPs from native RNA samples, suggesting the feasibility of exo-sequencing using enzyme-conjugated MspA-PBA. Although not demonstrated, this strategy is in principle suitable for sensing nucleoside diphosphates, nucleoside triphosphates, other nucleotide modifications, nucleotide sugars and nucleoside drugs, as long as the cis-diol of the ribose is still retained.

Methods

Preparation of homo-octameric MspAs

The genes coding for monomeric M2 MspA-D16H6 (D90N/D91N/D93N/D118R/D134R/E139K) and N90C MspA-H6 (D90C/D91N/D93N/D118R/D134R/E139K) were separately synthesized and simultaneously inserted into a pET 30a(+) plasmid (GenScript). A hexa-histidine tag (H6), which assists purification by nickel affinity chromatography, was added to the C terminus of both genes. A 16 aspartate tag (D16) was added to the end of the M2 MspA-D16H6 gene to enhance discrimination during gel electrophoresis between octameric M2 MspA-D16H6 and N90C MspA-H6.

The preparation of homo-octameric M2 MspA-D16H6 and N90C MspA-H6 was performed as previously reported49. Experimentally, 100 ng of either recombinant plasmid was added to 100 μl of Escherichia coli BL21 (DE3) pLysS competent cells (Sangon Biotech) and incubated on ice for 30 min. After heat shock transformation performed at 42 °C for 90 s, the mixture was cultured on ice for another 3 min. Then the mixture was added to 800 μl LB broth and shaken at 37 °C and 175 r.p.m. for 50 min. Subsequently, the mixture was spread onto a LB agar plate containing kanamycin (30 μg ml−1) and chloramphenicol (34 μg ml−1) and cultured for 18 h. A single colony was inoculated and added to 100 ml LB broth containing kanamycin (30 μg ml−1) and chloramphenicol (34 μg ml−1) in a 250 ml flask. The mixture was shaken at 37 °C and 175 r.p.m. until the optical density at 600 nm (OD600) reached 0.7. Isopropyl β-d-1-thiogalactopyranoside (IPTG) was then added to a final concentration of 0.5 mM to induce protein expression. The medium was shaken at 16 °C and 175 r.p.m. for a further 16 h. Finally, the medium was centrifuged at 4,000 r.p.m. and 4 °C for 20 min to collect the cell pellet.

The collected bacterial pellet was resuspended in 40 ml of a lysis buffer (100 mM Na2HPO4/NaH2PO4, 0.1 mM ethylenediaminetetraacetic acid (EDTA), 150 mM NaCl, 0.5% (v/v) Genapol X-80, pH 6.5) and incubated at 60 °C for 10 min. Afterwards, the suspension was ice-incubated for 10 min. The suspension was centrifuged at 13,000 r.p.m. for 40 min at 4 °C. The supernatant was collected and filtered with a 0.2 μm syringe filter (Nalgene). The filtered solution was then loaded to a HisTrap HP nickel ion affinity column (GE Healthcare). The column was first eluted with buffer A (0.5 M NaCl, 20 mM HEPES, 5 mM imidazole, 0.5% (v/v) Genapol X-80, pH 8.0) until the UV absorbance stabilized. It was then eluted using a linear gradient of buffer B (0.5 M NaCl, 20 mM HEPES, 500 mM imidazole, 0.5% (v/v) Genapol X-80, pH 8.0) and buffer A over six column volumes within 30 min. Tris(2-carboxyethyl) phosphine (TCEP) was added to both buffer A and buffer B with a final concentration of 2 mM to prevent the formation of disulfide bonds between cysteine residues when purifying homo-octameric N90C MspA-H6 (ref. 50). Finally, the eluted fractions were separately collected and characterized by gel electrophoresis (4–20% gradient sodium dodecyl sulfate (SDS)–polyacrylamide gel). The fractions containing the desired product were stored at −80 °C for subsequent use.

Preparation of (N90C)1(M2)7

For simplicity, the hetero-octameric MspA, which is composed of one fraction of N90C MspA-H6 and seven fractions of M2 MspA-D16H6, is referred to as (N90C)1(M2)7. To prepare for (N90C)1(M2)7, the genes coding for N90C MspA-H6 and M2 MspA-D16H6 were simultaneously placed in a co-expression vector pETDuet-1 (Supplementary Fig. 1). Specifically, the gene coding for N90C MspA-H6 was inserted between the restriction sites of NcoI and HindIII. The gene coding for M2 MspA-D16H6 was inserted between the restriction sites of NdeI and BlpI. A hexa-histidine tag (H6) was added to the C terminus of both genes to assist purification by nickel affinity chromatography. A 16 aspartate tag (D16) was added to the end of the M2 MspA-D16H6 gene to enhance the discrimination between hetero-octameric MspAs during gel electrophoresis.

Experimentally, 100 ng recombinant plasmid was transformed into 100 μl E. coli BL21 (DE3) pLysS competent cells (Sangon Biotech) and cultured on ice for 30 min. After heat shock transformation performed at 42 °C for 90 s, the mixture was cultured on ice for another 3 min. Then the mixture was added with 800 μl LB broth and cultured at 37 °C and 175 r.p.m. for 50 min. Subsequently, the mixture was spread onto a LB agar plate containing ampicillin (50 μg ml−1) and chloramphenicol (34 μg ml−1) and cultured for 18 h. A single colony was inoculated and added to LB broth containing ampicillin (50 μg ml−1) and chloramphenicol (34 μg ml−1). The mixture was shaken at 37 °C and 175 r.p.m. until OD600 reached 0.7. The medium was then transferred to 1 l LB broth containing ampicillin (50 μg ml−1) and chloramphenicol (34 μg ml−1). The mixture was shaken at 37 °C and 175 r.p.m. until OD600 reached 0.6. To induce protein expression, IPTG was then added to a final concentration of 0.1 mM. The medium was shaken at 16 °C and 175 r.p.m. for another 24 h. Finally, the medium was centrifuged at 4,000 r.p.m. for 20 min at 4 °C to collect the bacterial pellet.

The collected bacterial pellet was resuspended in 160 ml of lysis buffer (100 mM Na2HPO4/NaH2PO4, 0.1 mM EDTA, 150 mM NaCl, 0.5% (v/v) Genapol X-80, pH 6.5) and incubated at 60 °C for 50 min. After ice-incubation for 30 min, the suspension was centrifuged at 13,000 r.p.m. for 40 min at 4 °C. The supernatant was collected and filtered with a 0.2 μm syringe filter (Nalgene). It was then loaded to a HisTrap HP nickel ion affinity column (GE Healthcare). The column was first eluted with buffer A (0.5 M NaCl, 20 mM HEPES, 5 mM imidazole, 2 mM TCEP, 0.5% (v/v) Genapol X-80, pH 8.0) until the UV absorbance reached a stable level. It was then eluted using a linear gradient of buffer B (0.5 M NaCl, 20 mM HEPES, 500 mM imidazole, 2 mM TCEP, 0.5% (v/v) Genapol X-80, pH 8.0) and buffer A over 12 column volumes within 60 min. The elution fractions were separately collected and characterized by gel electrophoresis on a 4–20% gradient SDS–polyacrylamide gel (Supplementary Fig. 2). The fractions corresponding to all hetero-octameric MspAs were collected for further purifications.

Further separation of hetero-octameric MspA was performed on a 10% SDS–polyacrylamide gel (Supplementary Fig. 3). Gel electrophoresis was continually run for 16 h with a + 160 V applied potential. The gel was then stained with coomassie brilliant blue (1.25 g coomassie brilliant blue R250, 225 ml methanol, 50 ml glacial acetic acid, 225 ml ultrapure water) for 4 h. Subsequently, it was immersed with the de-staining buffer (400 ml methanol, 100 ml glacial acetic acid, replenished with ultrapure water to 1 l) until the protein bands were clearly visible. The protein band which corresponds to (N90C)1(M2)7 was excised from the gel and immersed with an extraction solution (150 mM NaCl, 15 mM Tris-HCl, pH 7.5, 0.2% DDM, 0.5% Genapol X-80, 5 mM TCEP, 10 mM EDTA) for 12 h. The mixture was collected and stored at −80 °C for subsequent use.

Preparation of MspA modified with a PBA

To modify (N90C)1(M2)7 with a phenylboronic acid, 1 μl prepared (N90C)1(M2)7, 0.2 μl MPBA (1 M, dissolved in dimethyl sulfoxide) and 8.8 μl 1.5 M KCl buffer (1.5 M KCl, 10 mM MOPS, pH 7.0) were mixed and incubated for 10 min. For simplicity, the PBA-modified MspA is referred to as MspA-PBA throughout the paper, if not otherwise stated.

Nanopore measurements

Nanopore measurements were performed similarly to that described previously50. To avoid interference from the measurement environment, the custom-made measurement device was fixed in a homemade Faraday cage mounted on an optical table (Jiangxi Liansheng technology Co., Ltd). The liquid chamber of the measurement device was separated by a Teflon film containing a 100 μm diameter orifice. Before each use, the orifice was first treated with 0.5% (v/v) hexadecane in pentane. Both chambers were then filled with 500 μl KCl buffer (1.5 M KCl, 10 mM MOPS, pH 7.0). A pair of Ag/AgCl electrodes, which were electrically extended from a patch-clamp amplifier, was immersed in both chambers, in contact with the buffers. Conventionally, the chamber that is electrically grounded is defined as cis and its opposing chamber is defined as trans. By adding a drop of 5 mg ml−1 DPhPC in pentane to each chamber and pipetting the liquid in either chamber up and down several times, the lipid bilayer was spontaneously formed. Then, octameric (N90C)1(M2)7 or MpsA-PBA was added to cis to trigger spontaneous pore insertion into the lipid bilayer. To avoid further insertions, the buffer in cis was exchanged with fresh buffer, upon a single nanopore insertion.

All single channel recordings were performed with an Axonpatch 200B patch-clamp amplifier coupled with a Digidata 1550B digitizer. The sampling rate is 25 kHz and the acquired trace is further digitally low-pass filtered with a corner frequency of 1 kHz. Unless otherwise stated, a +200 mV voltage was continually applied during all measurements. All analytes were added to the cis chamber to a desired concentration.

Data analysis

Raw Axon abf files were imported into MATLAB using the ‘abfload’ function downloaded from https://www.mathworks.com/matlabcentral/fileexchange/6190-abfload. The characteristic parameters of each event including %Ib, SD, toff and ton were extracted with a custom MATLAB program. Events with a toff < 10 ms were ignored. Subsequent analyses including histogram plots, scatter plots, violin plots and curve fittings were performed in Origin v.9.1 (Origin Lab).

Machine learning was performed by MATLAB. Five-hundred events of each analyte type were collected to form a dataset. The label for each event was assigned with the known identity of the analyte. The dataset was then split into a training set (80%) and a testing set (20%) for model training and model testing. %Ib and SD of events were employed as event features. Model training was performed using the Classification Learner toolbox of MATLAB. Mainstream classifiers including Decision Trees, Discriminant Analysis, Naïve Bayes, SVM, K Nearest Neighbour, Ensemble and Neural Network were estimated with default settings. According to results of tenfold cross validation accuracy and the testing accuracy, the Linear SVM model demonstrated the best performance. A confusion matrix and decision boundary were generated based on results of the Linear SVM model. The trained model was then applied for predictions of unlabelled data.

One-Class SVM was performed by MATLAB. Five-hundred events of each analyte type were collected to form a dataset. %Ib and SD of events were employed as event features. The OutlierFraction was set to 0.0005. Density-based spatial clustering of applications with noise cluster analysis was performed with Python. Parameters of Epsilon was set to 0.12 and min_samples was set to 18.

MicroRNA digestion

S1 nuclease (Takara) was applied to enzymatically digest RNA into nucleoside monophosphates (NMPs). Before the digestion, S1 nuclease was pretreated by ultrafiltration (Amicon, Ultra-0.5 ml, Ultracel-10 K) to remove glycerol. After ultrafiltration, the remaining solution in the filter device which contained S1 nuclease was collected. Subsequently, the reaction was performed by mixing 150 μg microRNA, 21 μl pretreated S1 nuclease solution (180 U μl−1), 6 μl 10X S1 nuclease buffer (300 mM CH3COONa, 2,800 mM NaCl, 10 mM ZnSO4, pH 4.6) and ultrapure water to a final volume of 60 μl. The reaction was kept at 23 °C for 4 h. To separate digested products, the mixture was then added to a centrifugal filter with a 10 kDa molecular weight cut off (MWCO) and centrifuged at 8,000 r.p.m. for 60 min at 4 °C. The filtrate was collected and stored at 4 °C for subsequent uses. All tips and tubes used were RNase-free.

Yeast tRNAPhe digestion

S1 nuclease (Takara) was applied to enzymatically digest RNA into nucleoside monophosphates (NMPs). Before the digestion, S1 nuclease was pretreated by four turns of ultrafiltration (Amicon, Ultra-0.5 ml, Ultracel-10 K) to remove glycerol. During each centrifugation operation, the S1 nuclease solution was added to the centrifugal filter with a 10 kDa MWCO and centrifuged at 8,000 r.p.m. for 60 min at 4 °C. After ultrafiltration, the remaining solution in the filter device which contained the S1 nuclease was collected. Subsequently, the reaction was performed by mixing 50 μg yeast RNAPhe, 28 μl pretreated S1 nuclease solution (180 U μl−1), 8 μl 10X S1 nuclease buffer (300 mM CH3COONa, 2,800 mM NaCl 10 mM ZnSO4, pH 4.6) and ultrapure water to a final volume of 80 μl. The reaction was kept at 23 °C for 15 h. To separate the digested products, the mixture was then added to a centrifugal filter with a 10 kDa MWCO and centrifuged at 8,000 r.p.m. for 60 min at 4 °C. The filtrate was collected and vacuum dried for 6 h. The powder was stored at 4 °C for subsequent uses. All tips and tubes used were RNase-free.

RNA composition quantification

During nanopore sensing of RNA digestion products, the digested NMP concentrations (Ci) were evaluated according to the following equation:

$$C_i = E_i/\left( {\delta _i \times t} \right)$$

Here, the annotation i (from 1 to 11) stands for parameters relevant to CMP, UMP, AMP, GMP, m5C, m6A, ψ, I, D, m7G and m1A, respectively. Here, Ei is the number of corresponding NMP binding events detected during a continuous sensing of RNA digestion products. An example of Ei is shown in Supplementary Fig. 33b,d. δi is the calibration coefficient, which is defined as the number of NMP binding events occurring per unit concentration (μM) per min. The values of δi were acquired during measurements with 300 μM corresponding NMP at +200 mV. δi are also summarized in Supplementary Table 10. t is the recording time, of 60 min.

The nucleotide compositions of RNA were derived according to the following equation:

$$N_{{i}} = L\frac{{C_{{i}}}}{{\mathop {\sum }\nolimits_1^{11} C_{{i}}}}$$

Here, L is the length of the RNA. For hsa-miR-17, L = 23. For hsa-miR-21, L = 22. For yeast tRNAPhe, L = 70.