Structural-profiling of low molecular weight RNAs by nanopore trapping/translocation using Mycobacterium smegmatis porin A

Folding of RNA can produce elaborate tertiary structures, corresponding to their diverse roles in the regulation of biological activities. Direct observation of RNA structures at high resolution in their native form however remains a challenge. The large vestibule and the narrow constriction of a Mycobacterium smegmatis porin A (MspA) suggests a sensing mode called nanopore trapping/translocation, which clearly distinguishes between microRNA, small interfering RNA (siRNA), transfer RNA (tRNA) and 5 S ribosomal RNA (rRNA). To further profit from the acquired event characteristics, a custom machine learning algorithm is developed. Events from measurements with a mixture of RNA analytes can be automatically classified, reporting a general accuracy of ~93.4%. tRNAs, which possess a unique tertiary structure, report a highly distinguishable sensing feature, different from all other RNA types tested in this study. With this strategy, tRNAs from different sources are measured and a high structural conservation across different species is observed in single molecule.

increased when the applied potential was increased from +50 mV to +150 mV. However, decreased when the potential was further increased from +150 mV to +250 mV. At +250 mV, all events demonstrate significant fluctuations followed with a deep blockage and a spontaneous restoration to the open pore state. These results indicate that successful translocation of 5S rRNA requires overcoming of a high entropic barrier. A high applied potential would promote translocation of 5S rRNA. The fluctuation noises observed likely result from electrophoretic driven unfolding of its overall structure.
Thus, the helix I-down conformation is most likely happening when a type 1 event was observed. The type 1 event is less likely from the loop C or loop E-down conformation since a loop structure is much more difficult to be electrophoretically unfolded than the helix. All measurements were carried out as described in Methods. 5S rRNA was added to cis with a final concentration of 10 nM.  Table 1) is the sole analyte. Translocation of overhanged siRNA gives rise to characteristic two-step shaped events ( Supplementary Fig. 6). c. A representative trace containing successive blunt siRNA translocation events. Luciferase siRNA serves as the sole analyte. d.

Supplementary
A representative trace containing successive tRNA translocation events. Brewer′s yeast phenylalanine specific tRNA (Sigma-Aldrich), also termed as tRNA phe , serves as the sole analyte. Two types of events with highly distinguishable event characteristics form the majority of all acquired events (Fig. 2b). The current traces from 9.5 s to 16 s has been omitted due to an event with an extremely long residence time. e. A representative trace containing successive 5S rRNA translocation events. E.coli 5S rRNA recovered from polyacrylamide gels serves as the sole analyte. orientation or pore clogging were occasionally observed. Though observable, these events only form a minority of all acquired events and contribute to the type "others" in the machine learning algorithm (Fig. 3a). Please note that a clogged pore can also be manually restored by reversing the applied potential to re-initiate follow-up measurements. Here, the training set and the testing set are model events of different types RNA but with previously known identities. Four sets of data acquired from nanopore measurements with a sequential addition of overhanged siRNA, blunt siRNA, tRNA and 5S rRNA are provided as demo predicting set. The whole workflow is composed of seven steps as below.

Supplementary
Step 1: Feature extraction. Eleven parameters of individual events in the segmentation data of dataset are extracted, forming a feature matrix for each event.
Step 2: Model building. The feature matrixes and labels of the training set generated by 10-fold cross validation for building the model and fine-tune the parameter. And the best performing trained model is saved into local.
Step 3: Model save: The trained model is saved to local for quick loading next time.
Step 4: Model testing: The feature matrixes and labels of the testing set are tested by the trained model and validate the performance of the models.
Step 5: Model output. Plots of feature importance, confusion matrix and learning curve of the best performing classifiers are generated.
Step 6: Model prediction. The feature table of predicting set are loaded to established machine learning model for event identification.
Step 7: Classification output. Five folders with sorted events of overhanged siRNA, blunt siRNA, tRNA, 5S rRNA and others are generated.