A molecular classification of diseases that accurately reflects clinical behaviour lays the foundation of precision medicine. The development of in silico classifiers coupled with molecular implementation based on DNA reactions marks a key advance in more powerful molecular classification, but it nevertheless remains a challenge to process multiple molecular datatypes. Here we introduce a DNA-encoded molecular classifier that can physically implement the computational classification of multidimensional molecular clinical data. To produce unified electrochemical sensing signals across heterogeneous molecular binding events, we exploit DNA-framework-based programmable atom-like nanoparticles with n valence to develop valence-encoded signal reporters that enable linearity in translating virtually any biomolecular binding events to signal gains. Multidimensional molecular information in computational classification is thus precisely assigned weights for bioanalysis. We demonstrate the implementation of a molecular classifier based on programmable atom-like nanoparticles to perform biomarker panel screening and analyse a panel of six biomarkers across three-dimensional datatypes for a near-deterministic molecular taxonomy of prostate cancer patients.
Precision medicine calls for the development of a disease-specific molecular classification method that accurately reflects clinical behaviour1,2,3,4. A consistent research trend has been to obtain massive amounts of data on multidimensional molecules, including DNA/RNA, proteins and small molecules, which triggers growing interest in using multiple molecular datatypes to better classify diseases2,5,6,7,8,9,10,11,12. For example, the World Health Organization incorporated molecular indicators (for example, cyclin-dependent kinase inhibitor 2A/B homozygous deletion and an isocitrate dehydrogenase mutant) for the classification of tumours of the central nervous system in the 2021 revision of the World Health Organization classification, providing illustrative examples of the new paradigm of integrated molecular classification13. Nevertheless, the heterogeneity of data obtained from various types of technologies accordingly increases and raises grand challenges in data integration and interpretation14,15,16,17. Examples include the heterogeneity in measurement sensitivity between RNA sequencing and chromatin immunoprecipitation sequencing, which causes significant gene expression variations that cannot be mirrored by chromatin modifications18. Hence, extensive computing-intensive data filtering and systematic normalization are indispensable to enable effective multidimensional data integration19,20.
Advances in developing in silico classifiers coupled with DNA-reaction-based molecular implementation provides a powerful and potentially generalizable means of molecular classification21,22 (Fig. 1a,b). Seelig and coworkers designed an in silico classifier model that could translate parameters and mathematical functions into a class of DNA probe reporters to realize multi-gene classification for the diagnosis of early cancers and respiratory infections23. Similarly, Han and coworkers demonstrated a molecular classifier that could analyse different microRNAs (miRNAs) in lung cancer serum samples with a diagnosis precision of 86.4% (ref. 24). The binding events between a target (DNA/RNA) and multiple single-stranded DNA reporters were uniformly translated to an assignment of weights for in silico analysis. However, the extension of this method to the dimensions of proteins or metabolic small molecules is difficult to implement due to the heterogeneous nature of these binding processes. A remaining challenge to realize DNA-based multidimensional molecular classifiers is thus to develop a signal reporter that can translate the heterogeneous, multidimensional molecular information into a unified output signal in a programmable manner (Fig. 1a).
The precision and programmable nature of the Watson–Crick base pairing of DNA delivers a spectrum of valence-controlled programmable atom-like nanostructures (PANs) for colloidal assembly with different compositions, sizes, chiralities and linearities25,26. In particular, self-assembled DNA tetrahedral frameworks (DTFs) provide a simple means to fabricate three-dimensional PANs with an ordered structure and versatile modification27,28,29,30. Here we introduce a PAN-based molecular classifier that can physically implement the computational classification of multidimensional molecular clinical data. The atom-like and programmable nature of a DTF supports the design of valence-controlled PAN signal reporters, resulting in linearity in translating virtually any class of molecular binding to unified electrochemical sensor signals (Supplementary Fig. 1). We demonstrate that the use of a PAN reporter allows precise weight assignment for multidimensional molecular information in computational classification, which is employed to interactively analyse a panel of six biomarkers across three-dimensional datatypes (RNA, protein and metabolic small molecule) for the classification of prostate cancer (PCa) patients. Moreover, we further developed a diagnosis panel screening system using PAN reporters for a classification related to the Gleason score.
Construction and characterization of PAN reporters
Figure 1c,d shows the general design principle for a DNA-encoded molecular classifier, which physically implements an in silico classifier for multidimensional molecular data with an electrochemical sensing system (Fig. 1e,f and Supplementary Fig. 2). To produce unified electrochemical sensing signals across heterogeneous molecular binding events, we designed valence-encoded PAN reporters using DTF-based PANs with n valence capable of targeting each target molecule across multiple dimensions. More importantly, we envisioned that the use of valence-encoded PAN reporters might encode PANs with a defined number of signal moieties, allowing for the physical implementation of a weight assignment (for example, 1, 2 or n) of the in silico classifier by anchoring 1, 2 or n signal moieties on a PAN reporter. Then, the signal gain from each target molecule would be linearly proportional to the number of signal moieties on the PAN reporter, which enables one to weigh each target molecule according to its importance in the in silico classifier.
To fabricate DTF-based PAN reporters, we first assembled a DTF containing a handle DNA on a vertex by mixing seven DNA fragments of 58 nucleotides (58-nt) and one handle DNA-containing DNA fragment of 81-nt in stoichiometric equivalents in buffer (Supplementary Fig. 3). We heated the mixture to 95 °C and then rapidly cooled it to 4 °C. The DTFs were assembled with a high yield of ~95%, characterized by atomic force microscopy (AFM; Supplementary Fig. 4) and polyacrylamide gel electrophoresis (PAGE; Supplementary Fig. 5). We measured a typical edge length of ~12 nm for DTFs (37 base pairs for each edge), which was consistent with its theoretical length31. To form the PAN reporter containing more anchoring sites of signal moieties, we coupled one DTF to another DTF to form DTF dimer structures through the hybridization of a linker DNA and the handle DNA in the two DTFs. The DTF dimer we formed had a dumbbell-shaped structure (with ~95% yield), as shown by AFM and PAGE imaging (Fig. 2a and Supplementary Figs. 5 and 6).
To validate the valence-encoded PAN reporters, which may encode PANs with a defined number of signal moieties, we employed fluorophore labels (for example, cyanine-3 (Cy3)) as signal moieties on PAN reporters and characterized the precise number of signal moieties on the PAN reporters via the single-molecule technique, total internal reflection fluorescence microscopy (TIRFM). PAN reporters containing a defined number of signal moieties (n = 1, 2 or n) were realized by anchoring 1, 2 or n fluorophores on the vertices of a DTF dimer. We observed that the fluorescent intensity of the PAN reporters in bulk solution was linearly proportional to the number of the signal moieties (R2 > 0.986, R2, R-squared; Fig. 2b). Moreover, aggregation-caused quenching cannot occur due to the separation of the fluorophores caused by the edge length of ~12 nm of the DTF32 (Supplementary Fig. 7). Similarly, the fluorescence intensity of a single PAN reporter increased linearly with the increase of the number of Cy3 from one to six (R2 > 0.998) in TIRFM measurements (Fig. 2c and Supplementary Fig. 8a). Moreover, we observed stepwise single-molecule fluorescence photobleaching33, as six steps of photobleaching trace the PAN reporter containing six Cy3 labels. One to five steps of photobleaching trace the PAN reporters containing one to five Cy3 labels (Fig. 2d and Supplementary Figs. 8b and 9). Thus, the numbers of the signal moieties on PAN reporters were precisely controlled from one to six.
We next asked whether PAN reporters possess the orthogonality to accommodate programmed multicolour reporters. We anchored two types of fluorophores on PAN reporters with different emissions but without fluorescence resonance energy transfer (Supplementary Fig. 10) on the PAN reporters. To this end, six fluorophores were anchored on single PAN reporters with various number combinations of Alexa Fluor 488 and Cy5. The fluorescence intensity and steps of photobleaching of the PAN reporters were linearly proportional to the numbers of each type of fluorophore, without interference with each other (Fig. 2e and Supplementary Fig. 11). For example, when we anchored one Alexa Fluor 488 fluorophore and five Cy5 fluorophores on a single PAN reporter, we observed one step of a photobleaching trace of Alexa Fluor 488 and five steps of a photobleaching trace of Cy5. Thus, the anchoring sites of the PAN reporter were individually controlled, with a defined number of signal moieties even in the presence of multiple distinct signal moieties.
To demonstrate the generality in labelling multiple types of signal moieties on the PAN reporter, we anchored various signalling moieties, including gold nanoparticles (AuNPs; usually used as a signal moiety for mass or colorimetric output)34,35,36 and enzymes (usually used as a signal moiety for fluorescent, colorimetric or electrocatalytic output)37,38,39,40. We visualized the spatial structure of the PAN reporter anchored with AuNPs via transmission electron microscopy (TEM) with a precisely controlled number from one to six (Fig. 2f and Supplementary Fig. 12). Interestingly, the spatial arrangement of AuNPs coincided with the vertices’ arrangement on the DTF dimers, indicating that the signal moieties were well anchored on the PAN reporter. We then used horseradish peroxidase (HRP) as an example to anchor on the PAN reporter. The AFM images showed a precise number of HRPs from one to six on the PAN reporter, as shown in Fig. 2g and Supplementary Fig. 13.
Molecular implementation of weight assignment
An in silico classifier realizes data classification via the assignment of a numerical weight to each piece of data that represents its importance, and then summing the weighted result41. Analogously, a multidimensional molecular classifier translates each molecular input with a weighted sensing signal representing its importance by designing a valence-encoded PAN reporter to program the unified electrochemical sensing signal for the multidimensional molecules.
We developed the weighting system for multidimensional molecules with PAN reporters (Fig. 3a). The essential role of the system was to facilitate the binding event between the probe and target molecule to trigger a weighted electrochemical signal. We used DTFs to pattern recognition probes on the electrode surface according to our previous reports42, leading to a uniform biorecognition layer. We employed a sandwich configuration to translate the molecular binding event into the recruitment of the PAN reporter on the electrode for RNAs and proteins. For example, for RNAs (messenger RNA (mRNA) or miRNA), a single-stranded DNA probe was used as the recognition probe, where base-pairing interactions capture the target RNAs on the electrode surface (Supplementary Fig. 14a,b). The PAN reporter then specifically recognized the overhang portion of the probe–target complex and translated the presence of the target RNAs into a weighted electrochemical signal with HRP as the signalling molecule (Fig. 3b). For proteins, a specific monoclonal antibody was used to capture the target protein on the electrode. Another antibody was then used to form an antibody–protein–antibody sandwich for the target protein (Fig. 3b and Supplementary Fig. 14c). For small molecules, we used an aptamer–DNA duplex as the recognition probe. The small-molecule-to-aptamer binding triggered the release of DNA on the electrode surface, which recruits the PAN reporter via a hybridization between the released DNA and the DNA linker on the PAN reporter (Fig. 3b and Supplementary Fig. 14d). Thus, we designed the weighting system for all the major dimensions of biologically relevant molecules, indicating the generality of our PAN reporter for the weight assignment in multidimensional molecules (Fig. 3a,b).
We experimentally implemented this weighting system by designing a weight assignment with one to six HRPs using a PAN reporter for multidimensional molecules (for example, miRNA, mRNA, proteins and small molecules). The electrochemical signal corresponding to the weight assignments was recorded after the addition of the targets until a steady electrochemical signal was achieved. We observed that the signals were linearly proportional to the weights that were realized through controlling the number of HRP on the PAN reporter (R2 > 0.997) for an RNA of 78-nt, a miRNA of 22-nt, an antigen of ~30,000 daltons and a small molecule with 13 atoms. Thus, this system was suitable for assigning an integer-valued weight to different targets (Fig. 3c).
To further demonstrate the generality of the design, we applied the weighting system to 12 additional biomarkers, including COVID-19 biomarkers (including Open Read Framework 1ab (ORF 1ab), envelope gene (E gene) and nucleus gene (N gene))43; cancer biomarkers (mRNA ROR2, mRNA MEIS2 and circulating tumor DNA ALU115)44; and disease-related miRNAs (miR-21, miR-26a, miR-375, miR-144, miR-153 and miR-183)45. We achieved a signal gain of 3.35 μA for ORF 1ab at a concentration of 1,000 copies μl−1 (~1.66 fM), indicating successful signal translation (Fig. 3d). Analogously, we observed remarkable signal gains of 3.75 μA for ALU115 with a concentration of 1 fM.
We further explored the implementation of the weighting system in complicated and biologically relevant matrices, including four types of different diluent of human body fluids (sweat, serum, urine and saliva) and five types of mouse tissue homogenates (heart, kidney, lung, stomach and liver). We observed efficient signal translation and achieved a remarkable signal gain for target molecules, so our weighting system was suitable for complicated biological samples (Supplementary Figs. 15 and 16).
Validation of the two-dimensional molecular classifier
To experimentally validate a two-dimensional molecular classifier, we employed prostate-specific antigen (PSA), a biomarker in PCa diagnosis, and MEIS2, an mRNA biomarker related to PCa, as the target biomarkers (Fig. 3e)46. We assigned a positive weight of +3 to PSA and a negative weight of –3 or –1 to MEIS2. A positive weight represents the positive correlation and a negative weight represents the negative correlation to disease, while their values indicate their importance. We prepared 64 mimetic samples through mixing these two biomarkers with different concentration combinations (Supplementary Table 1) and measured these biomarkers using our PAN reporter (Fig. 3e, left). After analysing the data via a mathematical function (Result= 3CPSA – 3CMEIS2; C, concentration), we found that the 64 samples were classified into two groups, in agreement with our classifier design (Fig. 3e, right). Moreover, when we changed the weight of MEIS2 from −3 to −1 via a mathematical function (Result= 3CPSA – 1CMEIS2), those samples were also classified into two groups but with a different thresholding boundary compared with Result= 3CPSA – 3CMEIS2 (Supplementary Fig. 17).
In silico training for PCa diagnosis
Next, we attempted to scale up our molecular classifier and employ multidimensional data to classify PCa patients. The workflow is illustrated in Supplementary Fig. 18. To obtain an in silico classifier model for PCa patients’ classification, we used publicly available gene and miRNA profiling data from Gene Expression Omnibus, as well as PSA and sarcosine measurement data from previous works47, for classifier training (Fig. 4a). We analysed the distributions of the multidimensional molecules between the healthy individuals and PCa patients, and the selected molecules were distinguishable between these two groups (Supplementary Figs. 19–22). We further investigated the classification models with our classifier, and the robust validation capabilities were confirmed (Supplementary Figs. 23–25).
We integrated the three datasets into a large dataset to evaluate the application for multidimensional molecules and searched the weight combinations by using several logistic regression models with different optimized emphases (Supplementary Fig. 26). We then selected the precision-optimized model to avoid overtreatment (Fig. 4b,c). The optimal weights obtained included miR-153 (weight = –1), miR-183 (weight = +4), ROR2 (weight = –2), MEIS2 (weight = –3), PSA (weight = +3) and sarcosine (SO; weight = +1). With this set of weights, we achieved a recognition sensitivity of 80%, specificity of 100%, F1-score of 97%, receiver operating characteristic (ROC) curve of 97%, precision of 100% and accuracy of 95% for the validation set (Fig. 4c and Supplementary Fig. 26c; the parameters are presented in Supplementary Table 2). Further, we compared the training and validation sets using standard deviation analysis of the multidimensional targets for PCa diagnosis (Fig. 4d,e). The classifier showed excellent specificity and sensitivity, and it was feasible to achieve molecular implementation.
PCa diagnosis using multidimensional molecular classifier
We first validated the signal-translating performance of the PAN reporter for six biomarkers of PCa. The electrochemical signal of miRNA exhibited a concentration-dependent linear response with a dynamic range of four orders of magnitude. The detection limit for miRNAs was estimated as 100 fM, allowing for the direct analysis of miRNAs for real samples48 (Supplementary Fig. 27). Similarly, we achieved the sensitive detection of mRNA, PSA and SO with dynamic ranges of three to five orders of magnitude. The detection limits were down to 1 pM for mRNA, 0.05 ng ml–1 for PSA and 10 nM for SO (Supplementary Figs. 28–30). The electrochemical signals were also positively correlated to the weights for each biomarker, in agreement with the trends in Fig. 3c. Thus, we successfully established the weight assignment for the six biomarkers (miR-153, miR-183, ROR2, MEIS2, PSA and SO; Supplementary Figs. 31–34).
We then implemented the molecular classifier for the classification of real clinical samples from 32 PCa patients and 50 healthy individuals (the sample information is summarized in Supplementary Table 3). The workflow for clinical sample classification is presented in Supplementary Fig. 35. As shown in Fig. 5a,b, we successfully employed the PAN reporter to convert the six biomarkers into weighted electrochemical signals using the optimized weight sets (Fig. 5c). We realized an accurate classification between PCa patients and healthy individuals with our molecular classifier (P value < 0.01; Fig. 5d). The ROC curve indicated a high predictive power with an area under the curve (AUC) of 100% using our molecular classifier (Fig. 5d). We obtained a specificity of 100% and sensitivity of 100%, with the optimal cut-off value. By contrast, we obtained an AUC of only 54% with a single miRNA (miR-183) and an AUC of 84% with a single mRNA (ROR2; Fig. 5e and Supplementary Fig. 36).
Biomarker panel screening using molecular classifier
Biomarker panels have the potential to distinguish between patients in various disease processes49 (for example, patients with various Gleason scores for PCa). The rational design of biomarker panels with optimal weighting more accurately reflects the multiple disease processes of cancer. However, the screening of the optimal weighting of each biomarker is challenging. We used serum samples from 12 patients to screen the optimal weighting of the biomarker panel. Samples included four samples with a Gleason score of 6, four samples with a Gleason score of 7 and four samples with a Gleason score of 8 or 9. We used a panel of miRNAs (miR-32, miR-96, miR-153, miR-183) as a model system and assigned weights 1, 2, 3 and 4 to each miRNA using our PAN reporter’s weighting system. The weighted signals from the miRNAs with different weight combinations were obtained as 2,048 combinations. The results were used for clustering analysis to screen the optimal weighting set of the biomarker panel (Fig. 6a,b). As shown in Fig. 6c, top five correlation analysis allowed for the classification of three groups according to the Gleason scores, with the optimal weighted result given as Result = 3CmiR-32 – CmiR-96 + CmiR-153 – 2CmiR-183, as shown in Fig. 6d, indicating the ability of our molecular classifier to perform the biomarker panel screening.
In summary, we developed valence-encoded PAN signal reporters by exploiting DNA frameworks to realize multidimensional molecular classification, which resulted in precise PCa diagnosis (an AUC of 100%) with six biomarkers across three-dimensional datatypes (Supplementary Information). Given the ever-increasing amount of molecular information from the gene, RNA, protein and metabolomic profiling of diseases, our multidimensional molecular classifiers for analysing multidimensional molecular biomarkers sheds light on precision diagnosis and therapy.
The study was approved by the Ethics Committee at Renji Hospital, School of Medicine, Shanghai Jiao Tong University. All methods were performed in accordance with these approved guidelines.
The workflow for the classification of real clinical samples is presented in Supplementary Fig. 35. Recognition probes for each target were first modified on the electrode. The read-out of the electrochemical signal of the multidimensional target was performed by weighting the capture of the recognition probe with the PAN reporter. The final classification of clinical samples was achieved by a diagnostic function. The cost for a patient is only US$6.3 (Supplementary Table 4)
Data availability and simulations
The miRNA data
The miRNA data for PCa patient analysis was from GPL8227 (Agilent-019118 Human miRNA Microarray 2.0 G4470B; miRNA ID version). This dataset included 113 prostate patients and 28 healthy individuals. For every single person, there were 881 miRNA described, such as miR-183. According to the t-test result, 171 miRNA described were selected with a high significant difference between patients and healthy ones. Tree-based feature selection from sklearn (the function library of tree-based feature) was used to select the top related miRNAs (miR-183 and miR-153).
The mRNA data
The mRNA data for the PCa patient analysis was from GPL10264 (Human Exon 1.0 ST Array; CDF file, HuEx_1_0_st_v2_main_A20071112_EP.cdf) and recorded the Affymetrix gene expression of 150 PCa patients and 29 healthy individuals. The descriptors were dimensionality-reduced from 43,419 to 6,148 by a t-test, and remained two items (NM_170675 (MEIS2) and NM_004560 (ROR2)) by tree-based feature selection.
The clinical dataset was from the literature47. It contained 70 PCa patients and 32 healthy individuals. The most important features were the PSA and SO after being treated similarly, as mentioned earlier.
From gene expression data (GPL10264), we obtained 150 PCa patients and 29 healthy individuals and employed a tree-based feature selection method to screen for the two most related aberrant expressed genes. We selected ROR2 and MEIS2 from 43,419 items (dataset 1). Similarly, we analysed the miRNA profiling (GPL8227) with 113 PCa patients and 28 healthy individuals, and selected two important miRNAs (miR-153 and miR-183) by feature selection (dataset 2). In addition, PSA and SO were selected as protein and small molecule biomarkers, respectively, from the clinical data (70 PCa patients and 32 healthy individuals; dataset 3).
The missing data were replaced by the average of each descriptor among the same group. In all, the combined dataset had 422 samples; among these were 333 PCa patients and 89 healthy individuals. Each individual was described by six selected descriptors.
Tree-based feature selection from sklearn was used for feature selection, and the logistic regression module from sklearn was applied to classify the two-category model. To find the integer weights of each descriptor, an exhaustive search method hunted through the whole integer parameter space from –4 to 4. The accuracy, precision, recall and F1-score of every model were calculated and recorded. The classification analysis was implemented by the Classification learning app in MATLAB (R2020b).
The concentration and weight correlation of the molecular classifier were calibrated with the standard samples for different targets before diagnosis applications. The concentration of the standard was quantified by the UV absorbance at 260 nm by the Shanghai Institute of Measurement and Testing Technology. (The certificate of the standard samples is provided in Supplementary Information and Supplementary Table 5.)
Synthesis and purification of DTF-based PAN reporter
All DNA strands were mixed in TM buffer to synthesize the DTF structures (the proportions are illustrated in Supplementary Tables 6–19). The mixture was heated to 95 °C for 15 min, and cooled to 4 °C for at least 20 min by using a PTC-200 thermal cycler DNA engine (MJ Research, USA). We purified the synthesized DTF structures according to the method reported in the literature50. Our PAN reporter is simple to prepare and can be successfully synthesized even by undergraduate students without any knowledge in this field (Supplementary Fig. 37 and Supplementary Table 20). Moreover, we were able to achieve millilitre-level (7.5 ml) synthesis using a metal blocker for the bulk preparation of PAN (Supplementary Fig. 38). The PAN was tested and characterized through PAGE after being stored in buffer solution or serum for 1, 3, 7 and 15 days. As shown in Supplementary Fig. 39, PAN remained stable in the buffer solution even after 15 days and stable in the serum for at least a day. Thus, PAN reporters can be prepared in bulk and preserved for long periods, with potential for practical clinical applications (Supplementary Table 21). The stability of the DTF at the interface was also examined, to adapt it to interfacial applications. After being modified with both Cy3 and Cy5 on the same edge of the DTF, we found that the DTF can be stable at the interface for up to five days, as determined with fluorescence resonance energy transfer and dual fluorescence co-localization (Supplementary Fig. 40).
Weighting system for miRNA information translation
The purified DTF for short-strand RNA interface capture (1 μM, 6 μl) was incubated on the cleaned electrode overnight at room temperature. The electrodes were then passivated by methylcyclohexane (2 mM), polyethylene glycol 2000 (2 mM) and 10% bovine serum albumin. After that, the electrodes were washed with phosphate-buffered saline (PBS) and dried with nitrogen. Next, the samples were dropped on the electrode surface and incubated for 2 h at 25 °C. The PAN reporter (50 nM) was added on the electrode surface and incubated for 2 h at 25 °C. Finally, 4 μl of avidin-HRP was added on the electrode surface for 15 min to bind to the biotin in the molecular reporter. After being washed thoroughly, the electrodes were immersed in TMB solution buffer for electrochemical measurements.
Weighting system for mRNA information translation
The purified DTF dimer for long-strand RNA interface capture (1 μM, 6 μl) was incubated on the cleaned electrode overnight at room temperature. The processes of sealing and content information transformation for mRNA were the same as those of miRNA.
Weighting system for PSA information translation
The purified DTF for PSA interface capture (1 μM, 6 μl) was incubated on the cleaned electrode overnight at room temperature. After being washing twice with PBST buffer and once with PBS buffer, anti-PSA monoclonal antibody (coating; monoclonal antibody is a highly uniform antibody and only specific to a specific epitope) (L1; 100 μM, 6 μl) was dropped on the chip electrode and incubated at room temperature for 2 h to form the fixed probe, and then the electrode was washed twice with PBST buffer and once with PBS buffer. Subsequently, a series of PSA samples in PBS buffer (6 μl) at variable concentrations were dropped on the chip electrodes and incubated at 37 °C for 1 h. After washing, anti-PSA monoclonal antibody (labelling) (L2; 100 μM, 6 μl) was dropped on the chip electrode and incubated at room temperature for 2 h to form the capture probe. The chip electrode was washed twice with PBST buffer and once with PBS buffer. After that, the PAN reporter (100 μM, 6 μl) was dropped on the chip electrode and incubated at room temperature for 2 h to form the weighting probe. Finally, excess avidin-HRP was dropped onto the electrode and incubated at room temperature for 15 min. After washing twice with PBST buffer and once with PBS buffer, electrochemical testing was performed immediately. The sequences of L1 and L2 were shown in Supplementary Tables.
Weighting system for SO information translation
The purified DTF for SO interface capture (1 μM, 6 μl) was incubated on the cleaned electrode overnight at room temperature. The electrodes were passivated with 0.13% methylcyclohexane, 20 mg ml–1 polyethylene glycol and 1% casein in sequence for 1 h. The diluted sample solution was incubated on the 16-channel electrodes for 2 h at room temperature (6 μl). After 2 h, the 16-channel electrodes were washed with the washing buffer. The PAN reporter was incubated on the electrodes for 2 h and then washed with PBS buffer. Finally, excess avidin-HRP was dropped onto the electrode and incubated at room temperature for 15 min. After washing twice with PBST buffer and once with PBS buffer, electrochemical testing was performed immediately.
All electrochemical measurements were done on a Model 1040C (CH Instruments). The working gold 16-channel electrode, the auxiliary electrode and the reference electrode, integrated in the chip, were used. Cyclic voltammetry was carried out at a scan rate of 100 mV s–1. The current was recorded at –100 mV after the steady state of the HRP catalytic reaction was reached30.
Biomarker panel screening using molecular classifier
In the experiments of the biomarker panel screening, we used the fluorescent signal chip system. The 500 nM miRNA capture probes (Supplementary Tables 16 and 17) were printed by microarray robot (Nano-Plotter NP2.1). After incubating overnight, the chip was then blocked by 2 mM polyethylene glycol 2000 for 45 min and 2% bovine serum albumin for 1 h. The diluent for clinical samples was added on the chip and incubated for 2 h at 25 °C. The PAN reporter with different weights (50 nM) was then added on the chip and incubated for 2 h at 25 °C. Then the chip was imaged by a GenePix 4100A microarray scanner. We obtained signals of four targets with four weights for 12 patients. By adding and subtracting combinations of them, 2,048 (84/2) diagnostic formulas were obtained. Finally, the optimal formulas were filtered by cluster analysis.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request. Furthermore, the miRNA, mRNA, PSA and SO data used in this study are available in ref. 47 and the National Center for Biotechnology Information database, https://www.ncbi.nlm.nih.gov/genome.
Collins, F. S. & Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).
Thomasian, N. M., Kamel, I. R. & Bai, H. X. Machine intelligence in non-invasive endocrine cancer diagnostics. Nat. Rev. Endocrinol. 18, 81–95 (2022).
Vargas, A. J. & Harris, C. C. Biomarker development in the precision medicine era: lung cancer as a case study. Nat. Rev. Cancer 16, 525–537 (2016).
Nassiri, F. et al. Detection and discrimination of intracranial tumors using plasma cell-free DNA methylomes. Nat. Med. 26, 1044–1047 (2020).
Krzywinski, M. & Savig, E. Multidimensional data. Nat. Methods 10, 595 (2013).
Luo, Y. et al. A multidimensional precision medicine approach identifies an autism subtype characterized by dyslipidemia. Nat. Med. 26, 1375–1379 (2020).
Larance, M. & Lamond, A. I. Multidimensional proteomics for cell biology. Nat. Rev. Mol. Cell Biol. 16, 269–280 (2015).
Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).
Berger, B., Peng, J. & Singh, M. Computational solutions for omics data. Nat. Rev. Genet. 14, 333–346 (2013).
Crichton, D. J. et al. Cancer biomarkers and big data: a planetary science approach. Cancer Cell 38, 757–760 (2020).
Liang, H. et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat. Med. 25, 433–438 (2019).
Kristensen, V. N. et al. Principles and methods of integrative genomic analyses in cancer. Nat. Rev. Cancer 14, 299–313 (2014).
Komori, T. The 2021 WHO classification of tumors, 5th edition, central nervous system tumors: the 10 basic principles. Brain Tumor Pathol. 39, 47–50 (2022).
Blanc, T., El Beheiry, M., Caporal, C., Masson, J. B. & Hajj, B. Genuage: visualize and analyze multidimensional single-molecule point cloud data in virtual reality. Nat. Methods 17, 1100–1102 (2020).
Adamcova, M. & Šimko, F. Multiplex biomarker approach to cardiovascular diseases. Acta Pharmacol. Sin. 39, 1068–1072 (2018).
Subramanian, I., Verma, S., Kumar, S., Jere, A. & Anamika, K. Multi-omics data integration, interpretation, and its application. Bioinf. Biol. Insights https://doi.org/10.1177/1177932219899051 (2020).
Montaner, J. et al. Multilevel omics for the discovery of biomarkers and therapeutic targets for stroke. Nat. Rev. Neurol. 16, 247–264 (2020).
Tarazona, S., Arzalluz-Luque, A. & Conesa, A. Undisclosed, unmet and neglected challenges in multi-omics studies. Nat. Comput. Sci. 1, 395–402 (2021).
Tarazona, S. et al. Harmonization of quality metrics and power calculation in multi-omic studies. Nat. Commun. 11, 3092 (2020).
Lopez de Maturana, E. et al. Challenges in the integration of omics and non-omics data. Genes 10, 238 (2019).
Benenson, Y., Gil, B., Ben-Dor, U., Adar, R. & Shapiro, E. An autonomous molecular computer for logical control of gene expression. Nature 429, 423–429 (2004).
Seelig, G., Soloveichik, D., Zhang, D. Y. & Winfree, E. Enzyme-free nucleic acid logic circuits. Science 314, 1585–1588 (2006).
Lopez, R., Wang, R. & Seelig, G. A molecular multi-gene classifier for disease diagnostics. Nat. Chem. 10, 746–754 (2018).
Zhang, C. et al. Cancer diagnosis with DNA molecular computation. Nat. Nanotechnol. 15, 709–715 (2020).
Yao, G. et al. Meta-DNA structures. Nat. Chem. 12, 1067–1075 (2020).
Yao, G. et al. Programming nanoparticle valence bonds with single-stranded DNA encoders. Nat. Mater. 19, 781–788 (2020).
Li, J. et al. Encoding quantized fluorescence states with fractal DNA frameworks. Nat. Commun. 11, 2185 (2020).
Wiraja, C. et al. Framework nucleic acids as programmable carrier for transdermal drug delivery. Nat. Commun. 10, 1147 (2019).
Zhang, T. et al. Design, fabrication and applications of tetrahedral DNA nanostructure-based multifunctional complexes in drug delivery and biomedical treatment. Nat. Protoc. 15, 2728–2757 (2020).
Song, P. et al. Programming bulk enzyme heterojunctions for biosensor development with tetrahedral DNA framework. Nat. Commun. 11, 838 (2020).
Lin, M. et al. Programmable engineering of a biosensing interface with tetrahedral DNA nanostructures for ultrasensitive DNA detection. Angew. Chem. Int. Ed. 54, 2151–2155 (2015).
Woehrstein, J. B. et al. 100-nm metafluorophores with digitally tunable optical properties self-assembled from DNA. Sci. Adv. 3, e1602128 (2017).
Ulbrich, M. H. & Isacoff, E. Y. Subunit counting in membrane-bound proteins. Nat. Methods 4, 319–321 (2007).
Hearty, S., Leonard, P. & O’Kennedy, R. Barcodes check out prostate cancer. Nat. Nanotechnol. 5, 9–10 (2010).
Hill, H. D. & Mirkin, C. A. The bio-barcode assay for the detection of protein and nucleic acid targets using DTT-induced ligand exchange. Nat. Protoc. 1, 324–336 (2006).
Nam, J.-M., Thaxton, C. S. & Mirkin, C. A. Nanoparticle-based bio-bar codes for the ultrasensitive detection of proteins. Science 301, 1884–1886 (2003).
Zebda, A. et al. Mediatorless high-power glucose biofuel cells based on compressed carbon nanotube-enzyme electrodes. Nat. Commun. 2, 370 (2011).
de Jong, O. G. et al. A CRISPR-Cas9-based reporter system for single-cell detection of extracellular vesicle-mediated functional transfer of RNA. Nat. Commun. 11, 1113 (2020).
Zhao, Z. et al. Nanocaged enzymes with enhanced catalytic activity and increased stability against protease digestion. Nat. Commun. 7, 10619 (2016).
He, L. et al. Transducing complex biomolecular interactions by temperature-output artificial DNA signaling networks. J. Am. Chem. Soc. 142, 14234–14239 (2020).
Li, H., Brouwer, C. R. & Luo, W. A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data. Nat. Commun. 13, 1901 (2022).
Lin, M. et al. Electrochemical detection of nucleic acids, proteins, small molecules and cells using a DNA-nanostructure-based universal biosensing platform. Nat. Protoc. 11, 1244–1263 (2016).
Gorog, D. A. et al. Current and novel biomarkers of thrombotic risk in COVID-19: a Consensus Statement from the International COVID-19 Thrombosis Biomarkers Colloquium. Nat. Rev. Cardiol. 19, 475–495 (2022).
Schwarzenbach, H., Hoon, D. S. B. & Pantel, K. Cell-free nucleic acids as biomarkers in cancer patients. Nat. Rev. Cancer 11, 426–437 (2011).
Xiao, B. et al. Plasma microRNA panel is a novel biomarker for focal segmental glomerulosclerosis and associated with podocyte apoptosis. Cell Death Dis. 9, 533 (2018).
Bhanvadia, R. R. et al. MEIS1 and MEIS2 expression and prostate cancer progression: a role for HOXB13 binding partners in metastatic disease. Clin. Cancer Res. 24, 3668–3680 (2018).
Kumar, D., Gupta, A., Mandhani, A. & Sankhwar, S. N. Metabolomics-derived prostate cancer biomarkers: fact or fiction? J. Proteome Res. 14, 1455–1464 (2015).
Rajakumar, T. et al. A blood-based miRNA signature with prognostic value for overall survival in advanced stage non-small cell lung cancer treated with immunotherapy. npj Precis. Oncol. 6, 19 (2022).
Nassiri, F. et al. A clinically applicable integrative molecular classification of meningiomas. Nature 597, 119–125 (2021).
Li, F. et al. Ultrafast DNA sensors with DNA framework-bridged hybridization reactions. J. Am. Chem. Soc. 142, 9975–9981 (2020).
This work was financially supported by the National Natural Science Foundation of China (T2188102, 22025404, 22001168); National Key R&D Program of China (2021YFF1200300); China National Postdoctoral Program for Innovative Talents (BX2021190) by the China Postdoctoral Science Foundation; Innovative Research Team of High-Level Local Universities in Shanghai (SHSMU-ZLCX20212602); 2022 Shanghai ‘Science and Technology Innovation Action Plan’ Fundamental Research Project (22JC1401202); Shanghai Jiao Tong University Scientific and Technological Innovation Funds (21X010202096) and Shanghai Municipal Health Commission (2022JC027).
The authors declare no competing interests.
Peer review information
Nature Nanotechnology thanks Hao Yan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Figs. 1–41, Tables 1–23, Discussion, Notes and References.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yin, F., Zhao, H., Lu, S. et al. DNA-framework-based multidimensional molecular classifiers for cancer diagnosis. Nat. Nanotechnol. (2023). https://doi.org/10.1038/s41565-023-01348-9