Introduction

Herbal medicines, also known as phytomedicines, are mixtures of plant metabolites that contain pharmacologically active compounds with some therapeutic properties1. Herbal medicines have been widely used historically for the treatment of numerous diseases, including but not limited to plague2,3,4, cardiac cerebral disease5,6 and pain relief 7,8. The therapeutic effects of herbal medicines are mainly attributable to their bioactive compounds, which are natural products, such as the antimalarial artemisinin from Artemesia annua9,10, the anti-inflammatory aspirin from Salix7 and the analgesic morphine from Papaver somniferum8. Accurate identification and quantification of the bioactive compounds in herbal medicines is essential for the discovery, production and quality control of herbal medicines11. However, due to their complexity, quantitative analysis of natural herbs is non-trivial.

Conventionally, Fourier transform infrared spectroscopy (FTIR)12,13, high-performance liquid chromatography (HPLC)14 and liquid chromatography-mass spectrometry (LC-MS)15 have been applied to the quantitative analysis of herbal medicines but these spectroscopic and chromatographic techniques usually require complicated and time-consuming sample pretreatment. With HPLC, simultaneous detection of multiple kinds of components in herbal medicines under the same measurement condition is difficult16. The overlap in FTIR spectra13 and the complexity of LC-MS data17 also pose challenges for data interpretation. Besides, the required state-of-the-art instruments are generally bulky and expensive and are not suitable for direct and rapid analysis of natural herb samples in a field environment.

Biological nanopore, originally developed for nucleic acid sequencing, is a versatile single molecule sensor of nucleic acids18,19, proteins20,21,22 and small molecules23,24. When appropriately modified with a reactive adapter, a biological nanopore becomes a nanoreactor, which is responsive to chemically compatible small molecules25,26,27. Under sensing in a nanopore, a target analyte specifically binds to and dissociates from the reactive adapter, producing highly characteristic nanopore events resulting from dynamic single molecule reactions occurring in the pore lumen. This mode of sensing is advantageous for recognition of target analytes directly from a complex mixture of compounds, since the interfering molecules in the environment either generate events that are readily distinguishable from the target analytes or fail to generate any events. Natural herbs, which contain both the desired bioactive components and interfering background analytes, are suitable for nanopore sensing. Compared with conventional herb analysis methods12,16,17, nanopore sensing has the advantages of a single molecule resolution, high accuracy, a facile sample separation and a high portability. To the best of our knowledge however, nanopore analysis of herbal medicines has never been previously demonstrated.

As has been recently reported, a Mycobacterium smegmatis porin A (MspA) nanopore modified with phenylboronic acid (PBA), also referred to as MspA-90PBA, demonstrates an exceptional resolution in the identification of cis-diols such as monosaccharides28, alditols29 and ribonucleotides30,31. The PBA adapter also serves to reversibly capture and release the analyte during recording, so that an extended event dwell time is reported. One drawback of this technique is that the PBA adapter is in principle not suitable for trans-diols due to the unmatched spatial conformation (Supplementary Fig. 1). Natural herbal medicines contain various cis-diol compounds including but not limited to phenolic acids32,33,34, saccharides35,36 and anthocyanin37,38. Salvianolic acids are members of a family of the most abundant water-soluble phenolic acids in Salvia miltiorrhiza which have been widely discovered in other herbs such as Rosemary39, Prunella vulgaris38 and Mint40. Naturally occurring salvianolic acids have been widely applied in the treatment of cardiovascular6 and cerebrovascular41 disease. According to the literatures42,43, salvianolic acids, with the exception of protocatechuic acid (PCA) and protocatechualdehyde (PA), are composed of salvianic acid A (SAA) or caffeic acid (CA) which acts as basic building blocks. This suggests that almost all salvianolic acids contain 1, 2-diol groups and can be detected by MspA-90PBA.

In this work, a single phenylboronic acid appended MspA (MspA-90PBA) nanopore (Fig. 1a) is used for the identification and quantification of salvianolic acids in herbal medicines, including injections and natural herbs. Assisted by a custom machine learning algorithm, the high resolution of MspA allows full discrimination between different types of salvianolic acids in natural samples. Moreover, this sensor can also be further integrated into a portable device to assist natural product investigations during fieldwork or for extreme situations. By being equipped with other chemical modifications, this nanopore sensor may also be suitable for a wider variety of herb samples.

Fig. 1: Identification of salvianolic acid B using MspA-90PBA.
figure 1

a The structure of MspA-90PBA. MspA-90PBA is a hetero-octameric MspA modified with a single phenylboronic acid (PBA) adapter at its pore constriction (Methods). The mechanism of salvianolic acid sensing is described on the right. Briefly, the PBA at the pore constriction can react reversibly with a cis-diol group of the analyte to produce a nanopore event. b A cartoon of the herb Salvia miltiorrhiza Bunge. Salvia miltiorrhiza (Danshen, dotted box), which is the root of Salvia miltiorrhiza Bunge, and contains rich levels of salvianolic acids. c The chemical structure of salvianolic acid B (SalB). SalB is a type of salvianolic acids. The 1, 2-diol groups on SalB are marked in red. SalB, which is the most abundant salvianolic acid in Salvia miltiorrhiza, is widely applied in the treatment of cardiovascular and cerebrovascular diseases. d A representative trace containing nanopore events of SalB. The measurement was carried out using MspA-90PBA in a buffer of 1.5 M KCl, 100 mM MOPS, pH 7.0. A + 100 mV voltage was continually applied. SalB was added to cis with a final concentration of 0.1 mM. Three types of events were observed from the trace. For the ease of demonstration, each event was respectively marked with different roman numerals to show their identities. e The scatter plot of \(\varDelta I/{I}_{o}\) versus \(S.D.\) for data acquired as described in (d). To remove background noises, the data in the scatter plot was treated by cluster analysis using DBSCAN (Supplementary Fig. 4). 672 events were demonstrated in the scatter plot. Source data are provided as a Source Data file.

Results

Identification of different salvianolic acids using MspA-90PBA

Unless stated otherwise, all nanopore measurements were performed using MspA-90PBA (Fig. 1a) in a buffer of 1.5 M KCl, 100 mM MOPS, pH 7.0. A + 100 mV bias was continually applied (for details, see Methods). In principle, all chemical components containing a cis-diol structure should react with the phenylboronic acid (PBA) adapter of MspA-90PBA to produce nanopore events28,44. If a molecular analyte contains multiple cis-diol structures, it would be expected to report multiple types of events resulting from different modes of binding. Compounds which fail to react with MspA-90PBA are not reported during nanopore sensing and this chemical selectivity enables recognition of target analytes directly from a mixture of compounds without any need for complex sample pretreatment.

In traditional Chinese medicine, Salvia miltiorrhiza (Danshen) has been widely used to treat cerebrovascular diseases41 (Fig. 1b). In later investigations, salvianolic acid B, ((2R)-2-[(E)-3-[(2R,3R)-3-[(1 R)-1-carboxy-2-(3,4-dihydroxyphenyl)ethoxy]carbonyl-2-(3,4-dihydroxyphenyl)-7-hydroxy-2,3-dihydro-1-benzofuran-4-yl]prop-2-enoyl]oxy-3-(3,4-dihydroxyphenyl)propanoic acid), or SalB, the most abundant salvianolic acid in Salvia miltiorrhiza, was identified as the effective compound45 (Fig. 1c). The chemical structure of SalB includes three separate 1, 2-diol structures, enabling its interactions with MspA-90PBA. In view of the high resolution of MspA, three discriminable event types which respectively originate from different binding modes of SalB with PBA, may also potentially be observed.

To support this, a nanopore measurement was arranged and SalB was added to the cis chamber at a final concentration of 0.1 mM (Methods). Corresponding nanopore events were observed immediately (Fig. 1d). To describe these nanopore events quantitatively, the relative blockage depth \(\varDelta I/{I}_{o}\) (\(({I}_{o}{-I}_{b})/{I}_{o}\)), standard deviation \(S.D.\), dwell time \({t}_{{off}}\) and the inter-event interval \({t}_{{on}},\) are defined as shown in Supplementary Fig. 2. The reciprocal of the inter-event interval (\(1/{\tau }_{{on}}\), N = 3) is linearly correlated with the SalB concentration, which is consistent with a bimolecular model24,46 (Supplementary Fig. 3 and Supplementary Table 1). The event dwell time (\({\tau }_{{off}}\), N = 3) however, is independent of the SalB concentrations, which is consistent with a unimolecular dissociation mechanism (Supplementary Fig. 3 and Supplementary Table 1). The kinetics further confirmed that the observed events were from SalB. According to event features such as \(\varDelta I/{I}_{o}\) and \(S.D.\), three types of events can be clearly observed from the trace (Fig. 1d), and this is more clearly demonstrated in the scatter plot of \(\varDelta I/{I}_{o}\) versus \(S.D.\) (Fig. 1e and Supplementary Fig. 4). To avoid interference from noise events, density-based spatial clustering of applications with noise (DBSCAN), a cluster analysis algorithm, was applied to remove events which fail to form a clear cluster (Supplementary Fig. 4). The ratios of events removed by DBSCAN are summarized in Supplementary Table 2. The count of event clusters is consistent with the number of 1, 2-diol groups of SalB (Fig. 1c), suggesting that each type of event results from a specific binding mode between SalB and PBA.

This sensing principle was also applied to other salvianolic acids including caffeic acid (CA), protocatechuic acid (PCA), protocatechualdehyde (PA), salvianic acid A (SAA), rosmarinic acid (RA), lithospermic acid (LSA), salvianolic acid A (SalA) and salvianolic acid B (SalB)47,48 (Fig. 2a–h). All aforementioned salvianolic acids report unique nanopore events. The count of event types is generally consistent with the number of 1, 2-diol structures of the salvianolic acids being tested, further confirming that the different event types are a result of different binding modes between components of salvianolic acid and PBA. Salvianolic acids containing only a single 1, 2-diol structure, such as CA, PCA and PA, only report a single type of event for each analyte. The scatter plots generated by each type of salvianolic acid are also shown in Supplementary Figs. 411. The ratios of interference events removed by DBSCAN are summarized in Supplementary Table 2. The interference events can be from impurities in the analytes derived from plant extraction49 or chemical degradation of salvianolic acids50,51,52,53. Spontaneous pore gating also contributes to the generation of interference events as well. Representative traces acquired with different salvianolic acids and the corresponding scatter plots are also demonstrated in Supplementary Figs. 12, 13 to show the data quality. When simultaneously compared in the same scatter plot of \(\varDelta I/{I}_{o}\) versus \(S.D.\), nanopore events acquired from different salvianolic acids are clearly distinguishable (Fig. 2i, Supplementary Table 3). To this end, MspA-90PBA has shown direct recognition of up to eight salvianolic acids whose event features are well discriminated by simultaneously considering the event features of \(\varDelta I/{I}_{o}\) versus \(S.D.\). When captured by the PBA adapter and chemically confined in the pore lumen, the analyte may further interact with the amino acid residues of the pore to produce characteristic noises on top of the blockage levels, which is useful for event identification. Salvianolic acids containing multiple 1, 2-diol structures also report multiple event types, and these event types are also distinguishable in the corresponding scatter plot of \(\varDelta I/{I}_{o}\) versus \(S.D.\), acknowledging the high resolution of this engineered MspA sensor.

Fig. 2: Discrimination of eight salvianolic acids using MspA-90PBA.
figure 2

ah The chemical structures of eight types of salvianolic acids and their corresponding nanopore events. The salvianolic acids include caffeic acid (CA), protocatechuic acid (PCA), protocatechualdehyde (PA), salvianic acid A (SAA), rosmarinic acid (RA), lithospermic acid (LSA), salvianolic acid A (SalA) and salvianolic acid B (SalB). The 1, 2-diol groups of each compound are marked in red. The abbreviations of each analytes are also marked with color bands, including black (CA), pink (PCA), red (PA), green (SAA), blue (RA), wine-red (LSA), lavender (SalA) and orange (SalB). All measurements were carried out using MspA-90PBA in a buffer of 1.5 M KCl, 100 mM MOPS, pH 7.0 (Methods). CA (1 mM), PCA (2 mM), PA (0.5 mM), SAA (0.5 mM), RA (0.3 mM), LSA (0.2 mM), SalA (0.03 mM) and SalB (0.1 mM) were separately added to cis. A + 100 mV bias was continually applied. CA (a), PCA (b) and PA (c) contain a single 1, 2-diol group and only one type of event was reported for each type of analyte. SAA (d), RA (e) and LSA (f), which contain two 1, 2-diol groups, report two types of events. SalA (g) and SalB (h), which contain three 1, 2-diol groups, report three types of events. (i) Top: The scatter plot of \(\varDelta I/{I}_{o}\) versus \(S.D\). of events acquired from all eight types of salvianolic acids. 500 events acquired with each type of analyte were included in the scatter plot (n = 4000). To remove background nomises, all events were treated by cluster analysis using DBSCAN, as described in Supplementary Figs. 411. Bottom: the zoomed-in view of the area marked with a dashed box in the top. Source data are provided as a Source Data file.

Machine learning assisted identification of salvianolic acids

In the field of nanopore research, machine learning has been widely applied to assist data analysis of nucleic acid sequencing54,55 and single molecule sensing56,57. In this work, the data acquired with complex samples leads to the difficulty of event identification by human eyes. Besides, when a large amount of data is involved, event identification automation also becomes urgent. To also quantitatively assist event identification and sensing performance evaluation, a Python-based machine learning algorithm was developed. At the outset, 500 events acquired with each type of salvianolic acids were selected for feature extraction. Different event types generated by the same type of salvianolic acid were not differently labeled during the training process. The event features are also independent of the concentration of the analyte used to produce the event.

A total of 4000 events from all eight types of salvianolic acids were collected and two event features, including \(\varDelta I/{I}_{o}\) and \(S.D.\), were used to build a feature matrix (Fig. 3a and Supplementary Fig. 14). The event label was assigned with the salvianolic acid that generated the event and in this way, a dataset was formed. This dataset was then randomly split into a training set (80% of the dataset) for model training and validation and a testing set (20%) for model testing. Six commonly used models28,30, including K-NearestNeighbor (KNN), Extreme Gradient Boosting (Xgboost), Classification and Regression Tree (CART), Support Vector Machine (SVM), Gradient Boost Decision Tree (GBDT) and Random Forest (RF) were applied for model training and default model parameters were used. The validation accuracy derived from the results of a 10-fold cross validation was used to identify the best performing model. KNN, which produces a 99.0% validation accuracy, is the best performing model (Fig. 3b). All models report a high classification accuracy, suggesting that the data acquired with different salvianolic acids was easily discriminable. All the above trainings were carried out with the default hyperparameter settings.

Fig. 3: The machine learning workflow.
figure 3

a The training dataset. 500 events acquired with each type of analytes, including CA, PCA, PA, SAA, RA, LSA, SalA and SalB, were collected to form the training dataset (top). Two event features, including the relative blockage depth (\(\varDelta I/{I}_{o}\)) and the standard deviation (\(S.D.\)), were extracted from each event to form a feature matrix (bottom). b Training accuracies. Six commonly used models including K-Nearest Neighbor (KNN), Extreme Gradient Boosting (Xgboost), Classification and Regression Tree (CART), Support Vector Machine (SVM), Gradient Boost Decision Tree (GBDT) and Random Forest (RF) were evaluated. KNN, which reports the highest validation accuracy, was selected for all subsequent prediction tasks. c The confusion matrix result of salvianolic acids classification performed by the trained KNN model. d A representative trace acquired by simultaneous sensing of all eight salvianolic acids. The measurement was carried out using MspA-90PBA in a buffer of 1.5 M KCl, 100 mM MOPS, pH 7.0 (Methods). All analytes were added to cis to reach the desired final concentrations and a + 100 mV bias was continually applied. Specifically, the final concentrations of CA and SAA were 40 μΜ, that of PCA was 100 μΜ and that of PA, RA, LSA, SalA and SalB were 20 μΜ. All events were automatically predicted by machine learning and labeled with corresponding labels. e The scatter plot of \(\varDelta I/{I}_{o}\) versus \(S.D.\) of results acquired by simultaneous sensing of all eight salvianolic acids using the same nanopore (n = 4268). Each event was identified by the previously trained KNN model and is color labeled. Source data are provided as a Source Data file.

The confusion matrix generated by prediction of the testing set using the previously trained KNN model is also shown in Fig. 3c (Methods). To evaluate the efficiency of a model, the learning curve was also plotted and no overfitting was seen. According to this learning curve, the validation score reached 0.984 when 291 training samples were included (Supplementary Fig. 15).

The trained KNN model was then employed to predict unlabeled events during simultaneous sensing of all eight salvianolic acids. The measurement was carried out as described in Fig. 3d. In the representative trace, characteristic nanopore events of all eight types of salvianolic acids were clearly seen based on their event features, consistent with the results produced when different salvianolic acids were tested separately (Supplementary Figs. 12, 13). All acquired nanopore events described in Fig. 3d were collected to perform feature extraction (Supplementary Fig. 14). To reduce interference of non-clustered background noises, the events were further treated by DBSCAN (Supplementary Fig. 16). Subsequently, the events were directly transmitted to the previously trained KNN model for event identification. All identified events were labeled on the trace (Fig. 3d, Supplementary Movie 1) or color labeled in the corresponding scatter plot (Fig. 3e, Supplementary Fig. 17). Although only two event features including \(\varDelta I/{I}_{o}\) and \(S.D.\) were employed for machine learning, the produced prediction accuracy is sufficiently good based on results shown in model validation (Fig. 3b), confusion matrix (Fig. 3c), learning curve production (Supplementary Fig. 15) and simultaneous analyte sensing and prediction (Fig. 3d, e). These results suggest that the event features of all eight salvianolic acids can be significantly discriminated from one another and the measurement consistency between pores is satisfactory. Considering the influence of different features on the prediction results, the importance of \(\varDelta I/{I}_{o}\) and \(S.D.\) are evaluated for eight analyte discrimination. The superimposed histograms of SalB, SalA and RA were shown in Supplementary Fig. 18 and demonstrated the fact that some salvianolic acids are indistinguishable when either \(\varDelta I/{I}_{o}\) or \(S.D.\) was employed. Furthermore, the mutual information between these two features were also calculated (Supplementary Fig. 18). Here, the mutual information value measures the correlation between the features and event labels, where a higher value indicates a closer correlation and the more important this feature is. The values of the two features are similar, indicating the equal importance of \(\varDelta I/{I}_{o}\) and \(S.D.\) in event identification. Definitely, more event features could be further included in the machine learning model building when additional types of events need to be simultaneously distinguished.

Rapid identification of salvianolic acids from salvianolate injection

The above demonstrated sensing capacity and the custom data analysis algorithm can be used for rapid identification of salvianolic acids components directly from an actual biological sample. Salvianolate injection is a natural herbal medicine derived from extracts of Salvia miltiorrhiza for the clinical treatment of coronary heart disease58,59. Although the magnesium salt of SalB is the main bioactive component in the salvianolate injection45, the production of salvianolate injection45,60 which includes extraction of Salvia miltiorrhiza with ethanol or hot water, macroporous resin adsorption, ethanol gradient elution, concentration and drying, resulting in multiple types of salvianolic acids in the injection sample. This suggests that a nanopore assay may assist in the characterization of the SalB components in the salvianolate injection sample and could provide a standard of medicinal quality control.

Nanopore measurements with salvianolate injection was carried out as shown in Fig. 4a. After setting up the measurement, a 4 μL injection sample was directly added to the cis side, without performing any sample pretreatment. Immediately after this, successive resistive pulses were observed in the nanopore trace (Fig. 4b). The event features of all nanopore events were extracted to generate the scatter plot and background noise reduction was performed by the previously described DBSCAN algorithm (Supplementary Fig. 19). The corresponding ratio of interference events removed by DBSCAN is demonstrated in Supplementary Table 2. After noise reduction, a scatter plot of \(\varDelta I/{I}_{o}\) and \(S.D.\) of results acquired with the salvianolate injection was generated (Fig. 4c). All events were predicted by the previously trained KNN model. According to the prediction results, components such as SalB, RA and LSA were clearly identified from the injection sample, and chemical components other than SalB were detectable from the salvianolate injection sample, consistent with results produced by HPLC measurements in previous reports61,62. The same conclusion was drawn from results acquired in three independent trials (Supplementary Figs. 19, 20), confirming the reproducibility of the measurement and the validity of the conclusion. Besides SalB, RA and LSA, traces of CA, SAA and PCA were also detected and identified using the previously trained KNN model (Fig. 4d and Supplementary Fig. 20). This result has not been previously reported by HPLC measurements of salvianolate injection, demonstrating the superior resolution and sensitivity of nanopore in the detection of trace amounts of target analytes.

Fig. 4: Rapid analysis of salvianolic acids in salvianolate injection.
figure 4

a The workflow of salvianolate injection analysis. Left: The powder of salvianolate injection was dissolved in Milli-Q water to reach a 5 mg/mL concentration. Center: 4 μL dissolved salvianolate injection was added to the cis chamber of a nanopore device. The measurement was carried out using MspA-90PBA in a buffer of 1.5 M KCl, 100 mM MOPS, pH 7.0 (Methods) and a bias of +100 mV was continually applied. Right: Corresponding nanopore events observed immediately. (b) A representative trace acquired during salvianolate injection analysis. The events were identified by the trained KNN model and are labeled accordingly. (c) The scatter plot of \(\varDelta I/{I}_{o}\) versus \(S.D.\) of events acquired with the salvianolate injection. The events in the scatter plot were taken from a 30 min continually recorded trace and a total of 846 events were collected. The events were labeled according to the prediction results performed by the previously trained KNN model. (d) The proportion of salvianolic acid events in the salvianolate injection. Data were presented as mean ± standard deviation values derived from results of three independent measurements (N = 3) (Supplementary Fig. 20). The error bars represent standard deviation values. Clearly, SalB is the main component of the salvianolate injection. However, other salvianolic acid components were also detected by nanopore. Source data are provided as a Source Data file.

Subsequently, to verify that the measured value of the content of the magnesium salt of SalB was consistent with the amount indicated in the product manual, the SalB content in the injection was further quantified. The calibration curve of pure SalB with a coefficient factor (\({R}^{2}=0.978\)) is shown in Supplementary Fig. 21. From the KNN model prediction, SalB events in the salvianolate injection were identified automatically (Supplementary Fig. 20). The \(1/{\tau }_{{on}}\) value of SalB was derived (Methods and Supplementary Table 4), according to which, the SalB content in the injection was calculated according to the calibration curve. Based on results of three independent measurements, the mass of the magnesium salt of SalB in the salvianolate injection was derived to be ~35.1 ± 0.2 mg (Supplementary Fig. 21 and Supplementary Table 4), consistent with the reference value in the product manual, which is 40 mg (Supplementary Fig. 21). All the above results confirm that the salvianolic acids components in the salvianolate injection can be well characterized by MspA-90PBA. Though the main component of the salvianolate injection was confirmed to be SalB, the nanopore detected other components such as RA, LSA, CA, SAA and PCA. To verify the effectiveness of the quantitative results, a mixture with predetermined concentrations of eight salvianolic acids was also measured and quantitatively analyzed as described in Supplementary Fig. 22. In principle, the concentrations of all salvianolic acids can be calculated by their corresponding calibration curves as described in Methods. Here, taking SalB as an example (Supplementary Fig. 22), the concentration of SalB in the mixture was derived to be ~ 23.6 μΜ, highly consistent with actual value, ~ 25 μΜ, confirming the capacity of nanopore in the quantification of bioactive compounds in herbal medicines. All these results suggest that the demonstrated nanopore assay is potentially suitable for quality control, drug screening or pharmacokinetics analysis of herbal medicines.

Rapid identification of salvianolic acids in natural herbs

The sensing configuration described here is also suitable for detection of salvianolic acids directly from natural herbs. Salvia miltiorrhiza, the root of Salvia miltiorrhiza Bunge, was used historically in China to treat cardiovascular disease6, and specifically, angina pectoris63 and myocardial infarction64. However, the complex composition of Salvia miltiorrhiza may pose challenges for the quality control of the herbal medicine and in addition, its clinical efficacy and safety cannot be guaranteed. The chemical constituent in Salvia miltiorrhiza has been studied extensively for over 80 years65. According to published reports34,48,66, salvianolic acids found in extracts of Salvia miltiorrhiza include SalB, SalA and RA. In clinical research, salvianolic acids were found to have important pharmacological effects such as scavenging free radicals67, and affected Na/K-ATPase68 activity, thereby minimizing cardiovascular and cerebrovascular damage. Although the bioactive constituents of Salvia miltiorrhiza have been intensively studied with chromatographic techniques48,69,70, a rapid single-molecule characterization of Salvia miltiorrhiza has never been reported to date.

The general workflow of Salvia miltiorrhiza analysis by nanopore is described in Fig. 5a and Supplementary Fig. 23a. Briefly, Salvia miltiorrhiza was crushed and soaked in deionized water at 4 °C for 12 h to extract salvianolic acids. The soaking liquid was centrifuged at 4 °C and 1500 g for 10 min and then the supernatant was collected and ultrafiltered with a 3 kDa ultrafiltration tube at 4 °C and 1900 g for 30 min. Subsequently, 20 μL of the filtrate was added to the cis chamber to initiate the measurement (Fig. 5b, c) and data analysis by machine learning was carried out with collected nanopore events. Distinct from reported strategies of herbal medicine analysis69,71, this strategy does not require any chromatographic methods for separation.

Fig. 5: Rapid identification of salvianolic acids in natural herbs.
figure 5

a A workflow of nanopore identification of salvianolic acids directly from natural herbs. The gray timeline stands for the time of the whole procedure and red bars represent the time of human operation. Phase I: Sample pretreatment. Natural herbs were crushed and soaked in Milli-Q water for 12 h at 4 °C. Human operations: herb crushing and soaking (5 min). Phase II: Liquid collection. The soaking liquid was centrifuged at 4 °C and 1500 g for 10 min and the supernatant was collected. Human operations: centrifugation preparation (1 min) and supernatant collection (1 min). Phase III: Ultrafiltration. The collected supernatant was treated with a 3 kDa ultrafiltration tube at 4 °C and 1900 g for 30 min and the filtrate was collected. Human operations: ultrafiltration preparation (1 min) and filtrate collection (1 min). Phase IV: Nanopore sensing. 20 μL filtrate was added to the cis chamber of a nanopore device. Human operations: sample addition (10 s). Phase V: Data analysis. Human operations: automatic data analysis by machine learning (2 min). A more detailed workflow was also described in Methods and Supplementary Fig. 23. b, e, h Three types of commercially available natural herbs including (b) Salvia miltiorrhiza, (e) Rosemary and (h) P. vulgaris and their corresponding soaking liquids. c, f, i Representative nanopore traces acquired with different herb samples. All events were identified by the trained KNN model and correspondingly labeled as RA (blue), LSA (wine-red), SalB (orange), PA (red), SalA (lavender) and others (black). The ‘other’ events represent events that don’t belong to any previously trained salvianolic acid model compounds, based on results of outlier analysis (Supplementary Figs. 24, 26, 27). d, g, j The proportion of salvianolic acid events from results acquired with (d) Salvia miltiorrhiza, (g) Rosemary and (j) P. vulgaris (Supplementary Figs. 25, 28, 29). Data were presented as mean ± standard deviation values derived from results of three independent measurements (N = 3). The error bars represent standard deviation values. All above described results were acquired by nanopore measurement using MspA-90PBA in a buffer of 1.5 M KCl, 100 mM MOPS, pH 7.0 and a + 100 mV bias, which was continually applied. Source data are provided as a Source Data file.

Two features of all nanopore events, \(\varDelta I/{I}_{o}\) and \(S.D.\) were extracted from the raw traces to generate the scatter plot (Supplementary Fig. 24). According to published reports72, there are in Salvia miltiorrhiza a variety of salvianolic acids and other substances containing cis-diol groups. In order to focus on previously trained salvianolic acids event types, all collected events were further treated by One-Class SVM to remove events that don’t belong to any previously trained dataset (Supplementary Fig. 24). The corresponding ratio of removed interference events is summarized in Supplementary Table 2. After the One-Class SVM treatment, all remaining events were predicted by the previously trained KNN model. All identified events were labeled on the trace (Fig. 5c, Supplementary Movie 2) or color marked in the scatter plot (Supplementary Fig. 25ac), from which SalB, RA, LSA and few of other salvianolic acids could be clearly identified. Three independent trials were carried out (Supplementary Figs. 24, 25), confirming the reproducibility of the measurement and the validity of the conclusion. According to the statistics, the relative content of SalB was identified to be the highest among all eight types of salvianolic acids. This also demonstrates the advantages of Salvia miltiorrhiza as the source material for SalB production (Fig. 5d). The \(1/{\tau }_{{on}}\) values of SalB events were subsequently calculated as described in Methods, according to which, the amount of SalB extracted from Salvia miltiorrhiza was derived based on the calibration curve (Supplementary Table 5). Finally, the SalB in Salvia miltiorrhiza was calculated to be ~ 12 ± 4 mg/g, which is highly consistent with those reported in previous literatures48,73,74 (Supplementary Table 6).

The generality of rapid single molecule detection of salvianolic acids from natural herbs was further verified with Rosmarinus officinalis L (Rosemary) and Prunella vulgaris L (P. vulgaris). Rosemary is a kind of herbal medicine commonly used in natural additives75 and for therapeutic purposes76. It has clinical effects of antioxidant77, anticancer78 and anti-inflammatory79 drugs. The therapeutic properties of Rosemary have been attributed to its phytochemical constituents80, including salvianolic acids and terpenoids. An aqueous extract of Rosemary contains several kinds of salvianolic acids80,81,82,83,84, among which RA is the most abundant component. As for P. vulgaris, the floral spikes of the plant has been generally applied to protect the liver85, alleviate sore throats86 and protect against breast cancer87. Previous investigations have indicated that salvianolic acids are amongst the main bioactive components of P. vulgaris88. Judged by HPLC measurements89,90,91, RA is the dominant salvianolic acid compound in extracts of P. vulgaris, but trace amounts of CA and PA were also detected.

As described in Fig. 5a, a volume of 20 μL ultrafiltration product from Rosemary and P. vulgaris were separately added to the cis chamber of the nanopore device to initiate the measurements. Representative raw traces acquired with these two herbal medicines were demonstrated (Fig. 5e, f, h, i). The interference events, which don’t resemble to any events previously reported by the eight standard salvianolic acids, were removed by One-Class SVM (Supplementary Figs. 26, 27). Compared with standard analytes (Supplementary Table 2), nanopore measurements performed with natural herb extracts report more interference events. It is expected because a variety of cis-diols in natural herbs such as saccharides36 and anthocyanin92 may also bind to the PBA adapter to generate nanopore events. After removal of noise events (Supplementary Figs. 26, 27), all remaining events were predicted by the previously trained KNN model (Figs. 5f, i and Supplementary Movies 3, 4) and marked with color-coded dots in the corresponding scatter plots (Supplementary Figs. 28, 29). Three independent measurements were performed to show the measurement reproducibility. RA events were clearly identified from Rosemary and P. vulgaris. The calibration curve of RA, demonstrating a coefficient factor (\({R}^{2}=0.995\)) is shown in Supplementary Fig. 30 and Supplementary Table 7. The corresponding \(1/{\tau }_{{on}}\) values of RA events acquired with Rosemary and P. vulgaris were measured (Methods) and summarized in Supplementary Table 5. Afterwards, based on the calibration curve, the RA concentration in Rosemary and P. vulgaris were derived to be ~1.26 ± 0.08 mg/g and ~0.53 ± 0.12 mg/g (Supplementary Table 5), which are generally consistent with those previously investigated by HPLC89,93,94,95 (Supplementary Table 6). The differences in RA concentration in our work and that reported in literatures may be due to the different material resource, sample pre-treatments and extraction processes.

According to the nanopore design, measurements performed at extremely high concentrations of analyte would result in the saturation of the PBA adapter and report inaccurate quantification. Thus, the effective concentration range for this measurement is defined to be the range of analyte concentration within which the \(1/{\tau }_{{on}}\) is linearly correlated with the input analyte concentration (Supplementary Fig. 31, Supplementary Table 8). For measurements beyond the effective concentration range, sample enrichment or dilution will become necessary. However, all these nanopore analysis of natural herbs could not be performed using M2 MspA, which lacks a PBA adapter, again confirming the importance of the PBA adapter (Supplementary Fig. 32).

Discussion

An engineered MspA nanopore was applied as a single molecule sensor of salvianolic acids. Eight salvianolic acids, including caffeic acid (CA), protocatechuic acid (PCA), protocatechualdehyde (PA), salvianic acid A (SAA), rosmarinic acid (RA), lithospermic acid (LSA), salvianolic acid A (SalA) and salvianolic acid B (SalB) were tested with this nanopore and their nanopore signatures are fully discriminable. Though all eight salvianolic acids have a wide range of spatial sizes, they can be simultaneously identified by the same nanopore. This high-resolution results from the sufficiently narrow pore constriction, which is comparable to target molecule size. The event features are primarily determined by interactions/local physio-chemical environment, leading to the extreme high resolution of the pore. To the best of our knowledge, nanopore sensing of salvianolic acids has never been previously reported. It is also observed that salvianolic acids containing multiple 1, 2-diol groups will report multiple types of binding events, demonstrating the superior resolution of MspA by its distinguishing of different binding modes between salvianolic acids and the PBA adapter. A custom machine learning algorithm was also developed and a 99.0% accuracy was reported. The superior resolution of sensing and the high performance of machine learning enables recognition of salvianolic acid components directly from natural herbs such as Salvia miltiorrhiza, Rosemary and P. vulgaris. No chromatographic separation is necessary and the workflow of sensing only requires a few minutes of human operation. Though only demonstrated with salvianolic acids, the demonstrated principle should be generally suitable for other herbal medicines. Though only two event parameters, including \(\varDelta I/{I}_{o}\) and \(S.D.\) were used in the machine learning model building, the model performance is satisfying, suggesting that the raw data is of a high quality and data separation. In the future, when more analytes were to be simultaneously analyzed, more event parameters, such as skewness, kurtosis and dwell time may be further included. To further expand its sensing capacity, this hetero-octameric MspA may be installed with other reactive adapters, including those based on coordination chemistry25,96,97, disulfide chemistry98 or click chemistry99,100, so that more diverse types of analytes may be sensed. With the increased complexity of the generated event features, the use of machine learning by simultaneous consideration of more event features28 or deep learning101 becomes indispensable. The whole setup may as well be further integrated into a miniaturized chip and used with a highly portable device, for applications in the field or in extreme situations when access to state-of-the-art instruments becomes impossible.

Methods

Preparation of a hetero-octameric Mycobacterium smegmatis porin A nanopore

All nanopore measurements were performed with MspA-90PBA, which is a hetero-octameric MspA specially engineered to contain an appended boronic acid at its pore constriction. In our previous works28,29,30, it was also referred to as (N90C)1(M2)7. To prepare the hetero-octameric MspA, two genes respectively coding for M2 MspA-D16H16 and N90C MspA-H6 were simultaneously placed in a co-expression vector pETDuet-1 (GenScript). Briefly, the gene coding for N90C MspA-H6 was inserted between the restriction site of Nco I and Hind III. The gene coding for M2 MspA-D16H16 was inserted between the restriction site of Nde I and Blp I. The hexa-histidine tag (H6) added to the C-terminus of both genes serve to assist protein purification by nickel affinity chromatography. The sixteen aspartic acid tag (D16) added to the end of M2 MspA-D16H16 serves to generate a molecular weight difference between different assembly types of M2 MspA-D16H16 and N90C MspA-H6, which is critical in the purification of the target hetero-octameric MspA. The desired hetero-octameric MspA assembly, which contains a single unit of N90C MspA-H6 and seven units of M2 MspA-D16H6, is referred to as (N90C)1(M2)7 (Fig. 1a).

By heat shock transformation at 42 °C for 90 s followed with ice incubation, the constructed co-expression vector was transformed into E. coli BL21 (DE3) pLysS competent cells (Sangon Biotech). Then, a single colony was picked up and added to a LB broth with ampicillin (50 μg/mL) and chloramphenicol (34 μg/mL). The mixture was shaken at 37 °C and 175 rpm until OD600 = 0.6. Subsequently, IPTG was added to the LB broth with a final concentration of 0.1 mM and shaken for 24 h at 16 °C and 175 rpm for protein overexpression. The mixture was then centrifuged at 4 °C and 1500 g for 20 min to collect the bacterial pellet. The pellet was then resuspended in a 160 mL lysis buffer (100 mM Na2HPO4/NaH2PO4, 0.1 mM EDTA, 150 mM NaCl, 0.5% (v/v) Genapol X-80, pH 6.5) and heated at 60 °C for 50 min. After cooling to room temperature, the suspension was then centrifuged at 4 °C and 16000 g for 60 min and the supernatant was collected. The supernatant was filtered with a 0.2 μm syringe filter and then loaded to a HisTrapTMHP nickel ion affinity column (GE Healthcare) to obtain target protein MspAs. To further separate (N90C)1(M2)7 from other pore assemblies, a 10% SDS-polyacrylamide gel was used to perform gel electrophoresis of the collected samples from nickel column purification. A + 160 V bias was continually applied for 16 h during the gel electrophoresis. Subsequently, the gel was stained with a coomassie brilliant blue solution (1.25 g coomassie brilliant blue R250, 225 mL MeOH, 50 mL glacial AcOH, 225 mL ultrapure water) for 4 h. Then, the de-staining solution (400 mL methanol, 100 mL glacial acetic acid, replenished with Milli-Q water to a volume of 1 L) was used for gel elution until the protein bands were clearly seen. The gel corresponding to the band of (N90C)1(M2)7 was then excised from the gel. The excised gel was crushed and immersed in an extraction solution (150 mM NaCl, 15 mM Tris-HCl, 0.2% (w/v) DDM, 0.5% (v/v) Genapol X-80, 5 mM TCEP, 10 mM EDTA, pH 7.5) for 12 h. The extracted (N90C)1(M2)7 was immediately used or stored at −80 °C for long-term use.

3-(maleimide) phenylboronic acid modification of (N90C)1(M2)7 MspA

To modify (N90C)1(M2)7 with a single phenylboronic acid, 5 μL (N90C)1(M2)7 and 2.5 μL 3-(maleimide) phenylboronic acid (500 mM, dissolved in DMSO) were mixed in a 43 μL buffer (1.5 M KCl, 100 mM MOPS, pH 7.0) for 10 min. The chemically modified (N90C)1(M2)7 was used during all nanopore measurements. The PBA modified (N90C)1(M2)7 is referred to as MspA-90PBA throughout the paper, if not otherwise stated.

Nanopore measurements and data analysis

All electrophysiological measurements were performed with an Axopatch 200B patch-clamp amplifier paired with a Digidata 1550B digitizer. The custom measurement device was separated by a Teflon film contain a 100 μm diameter orifice. Prior to the measurement, the orifice was treated by 2% (v/v) hexadecane in pentane. During the measurements, each chamber of the device was first filled with a 500 μL buffer (1.5 M KCl, 100 mM MOPS, pH 7.0) and a pair of Ag/AgCl electrodes were inserted into the chambers, in contact with the buffers and electrically connected with the patch clamp amplifier to form a closed circuit. Conventionally, the chamber that is electrically grounded is defined as cis and its opposing chamber is defined as trans. Then a drop of 5 mg/mL DPhPC in pentane was added to each chamber for lipid bilayer formation. By pipetting the buffer up and down in either chamber, the lipid bilayer was spontaneously formed. Afterwards, biological nanopores were added to the cis chamber to trigger pore insertion. Until a single nanopore was inserted, the buffer in the cis chamber was manually exchanged to prevent further nanopore insertions.

All single-channel recordings were sampled at 25 kHz and low-pass filtered with a corner frequency of 1 kHz. This setting of data acquisition is suitable for nanopore events with a dwell time of ~ms. A lower sampling rate is also advantageous to minimize the data size as well. If not otherwise stated, the +100 mV voltage was continually applied during all measurements and all measurements were carried out at room temperature (rt) (23 °C). All analytes were added to cis. All events were detected by Clampfit 10.7.

Event feature extraction

For each raw trace, the start and the end time of each event was identified by Clampfit 10.7 and saved in csv files (Supplementary Fig. 14). The start and the end time were applied as the time stamps to segment an event from the raw trace. Events with a \({t}_{{off}}\) < 30 ms were ignored. The segmented events were then used for further feature extraction using custom Python codes. The extracted features include relative blockage depth (\(\varDelta I/{I}_{o}\)) and standard deviation (\(S.D.\)) and a feature matrix is formed. The mean current amplitude before the start and after the end of each event was calculated to derive the open pore current (\({I}_{o}\)). Each relative blockage depth was derived according to\(\,\varDelta I/{I}_{o}=({{I}_{o}-I}_{b})/{I}_{o}\), where \({I}_{b}\) represents the residual current of an event. After features extraction, the feature matrix results were saved as a csv file for all subsequent machine learning operations.

Machine learning

Machine learning was performed in a Python environment. 500 events acquired with each type of analyte were collected and labeled to form a dataset. The dataset was randomly split into a training set (80% of the labeled data set) and a testing set (20%) for model training and model testing, respectively. The \(\varDelta I/{I}_{o}\) and \(S.D.\) of events were employed as event features. The training set was standardized and then applied in the training using six common models, including K-NearestNeighbor (KNN), Extreme Gradient Boosting (Xgboost), Classification and Regression Tree (CART), Support Vector Machine (SVM), Gradient Boost Decision Tree (GBDT) and Random Forest (RF). According to the 10-fold cross validation results, KNN was selected as the optimum model and was applied for all further data analysis. The confusion matrix and the learning curve generated by KNN were employed for model evaluation. The trained model was saved and further applied for event predictions.

All machine learning models and training data generated in this study have been deposited in Figshare. Please follow the link for data download: https://figshare.com/s/3e3593adb4dfe4999068

Composition quantification of herbal medicine

In the nanopore field, the reciprocal of the interevent interval (\(1/{\tau }_{{on}}\)) is widely known to be correlated with the target analyte concentration. It is also used for the quantitative measure of the analyte concentration102,103. The calibration curve of SalB was generated as the plot of \(1/{\tau }_{{on}}\) versus the SalB concentrations (Supplementary Fig. 21, Supplementary Table 1). During nanopore sensing of the salvianolate injection, all events were predicted by the machine learning algorithm and the \({t}_{{on}}\) of SalB events were picked up for SalB quantification (Supplementary Fig. 20). By single-exponential fitting as described in Supplementary Fig. 2, the values of \(1/{\tau }_{{on}}\) of SalB measured in the salvianolate injection was obtained (Supplementary Table 4). According to the calibration curve, the concentration of SalB in the salvianolate injection was calculated.

For natural herbs, the calculation of SalB content in Salvia miltiorrhiza and RA content in both Rosemary and P. vulgaris was identical to that previously demonstrated with the injection sample. The calibration curve of SalB and RA were generated as the plot of \(1/{\tau }_{{on}}\) versus the SalB and RA concentrations (Supplementary Figs. 21, 30, Supplementary Tables 1 and 7). During nanopore sensing of natural herbs, all events were predicted by the machine learning algorithm after being treated with outlier analysis. The \({t}_{{on}}\) of SalB events in Salvia miltiorrhiza (Supplementary Fig. 25) and the \({t}_{{on}}\) of RA events in both Rosemary (Supplementary Fig. 28) and P. vulgaris (Supplementary Fig. 29) were picked up for quantification. By single-exponential fitting as described in Supplementary Fig. 2, the values of \(1/{\tau }_{{on}}\) of SalB in Salvia miltiorrhiza and RA in both Rosemary and P. vulgaris measured in natural herbs were obtained (Supplementary Table 5). According to the calibration curve, the concentrations of SalB in Salvia miltiorrhiza and RA in both Rosemary and P. vulgaris were separately calculated.

Pretreatments of natural herbs Salvia miltiorrhiza, Rosemary and P. vulgaris

Three natural herbs were identically pretreated as described in Supplementary Fig. 23. The commercially available natural herbs were first crushed into powder. 2 g herb powder was added to 40 mL of Milli-Q and soaked for 12 h at 4 °C. Subsequently, the soaking liquid was centrifuged for 10 minutes (4 °C, 1500 g) and then filtered with a 3 kDa ultrafiltration tube for 30 minutes (4 °C, 1900 g) to collect the filtrate. Finally, 20 μL filtrate of each natural herb was added to the nanopore device to initiate the measurement.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.