Main

Precision medicine calls for the development of a disease-specific molecular classification method that accurately reflects clinical behaviour1,2,3,4. A consistent research trend has been to obtain massive amounts of data on multidimensional molecules, including DNA/RNA, proteins and small molecules, which triggers growing interest in using multiple molecular datatypes to better classify diseases2,5,6,7,8,9,10,11,12. For example, the World Health Organization incorporated molecular indicators (for example, cyclin-dependent kinase inhibitor 2A/B homozygous deletion and an isocitrate dehydrogenase mutant) for the classification of tumours of the central nervous system in the 2021 revision of the World Health Organization classification, providing illustrative examples of the new paradigm of integrated molecular classification13. Nevertheless, the heterogeneity of data obtained from various types of technologies accordingly increases and raises grand challenges in data integration and interpretation14,15,16,17. Examples include the heterogeneity in measurement sensitivity between RNA sequencing and chromatin immunoprecipitation sequencing, which causes significant gene expression variations that cannot be mirrored by chromatin modifications18. Hence, extensive computing-intensive data filtering and systematic normalization are indispensable to enable effective multidimensional data integration19,20.

Advances in developing in silico classifiers coupled with DNA-reaction-based molecular implementation provides a powerful and potentially generalizable means of molecular classification21,22 (Fig. 1a,b). Seelig and coworkers designed an in silico classifier model that could translate parameters and mathematical functions into a class of DNA probe reporters to realize multi-gene classification for the diagnosis of early cancers and respiratory infections23. Similarly, Han and coworkers demonstrated a molecular classifier that could analyse different microRNAs (miRNAs) in lung cancer serum samples with a diagnosis precision of 86.4% (ref. 24). The binding events between a target (DNA/RNA) and multiple single-stranded DNA reporters were uniformly translated to an assignment of weights for in silico analysis. However, the extension of this method to the dimensions of proteins or metabolic small molecules is difficult to implement due to the heterogeneous nature of these binding processes. A remaining challenge to realize DNA-based multidimensional molecular classifiers is thus to develop a signal reporter that can translate the heterogeneous, multidimensional molecular information into a unified output signal in a programmable manner (Fig. 1a).

Fig. 1: A PAN-reporter-based multidimensional molecular classifier for cancer diagnosis.
figure 1

a, Schematic illustration of multidimensional molecular classifier. The multidimensional molecular information is translated into a unified sensing signal. The classifier then handles the sensing signals to produce the interpretable yes/no answers. b, Scheme illustrating a conventional analysis based on one molecule, a single-dimensional molecular classifier and a multidimensional molecular classifier for cancer diagnosis. c, Schematic showing how an in silico classifier was trained and validated with publicly available data. Each of the multidimensional molecular targets was assigned a weight to represent its importance. d, The n-valence PAN enabled the construction of a valency-controlled signal reporter for multidimensional molecules. e,f, Image (e) and schematic illustration (f) of the multidimensional molecular classifier coupled with an electrochemical array with 16 Au electrodes. The arrow represents a biological sample being applied.

The precision and programmable nature of the Watson–Crick base pairing of DNA delivers a spectrum of valence-controlled programmable atom-like nanostructures (PANs) for colloidal assembly with different compositions, sizes, chiralities and linearities25,26. In particular, self-assembled DNA tetrahedral frameworks (DTFs) provide a simple means to fabricate three-dimensional PANs with an ordered structure and versatile modification27,28,29,30. Here we introduce a PAN-based molecular classifier that can physically implement the computational classification of multidimensional molecular clinical data. The atom-like and programmable nature of a DTF supports the design of valence-controlled PAN signal reporters, resulting in linearity in translating virtually any class of molecular binding to unified electrochemical sensor signals (Supplementary Fig. 1). We demonstrate that the use of a PAN reporter allows precise weight assignment for multidimensional molecular information in computational classification, which is employed to interactively analyse a panel of six biomarkers across three-dimensional datatypes (RNA, protein and metabolic small molecule) for the classification of prostate cancer (PCa) patients. Moreover, we further developed a diagnosis panel screening system using PAN reporters for a classification related to the Gleason score.

Construction and characterization of PAN reporters

Figure 1c,d shows the general design principle for a DNA-encoded molecular classifier, which physically implements an in silico classifier for multidimensional molecular data with an electrochemical sensing system (Fig. 1e,f and Supplementary Fig. 2). To produce unified electrochemical sensing signals across heterogeneous molecular binding events, we designed valence-encoded PAN reporters using DTF-based PANs with n valence capable of targeting each target molecule across multiple dimensions. More importantly, we envisioned that the use of valence-encoded PAN reporters might encode PANs with a defined number of signal moieties, allowing for the physical implementation of a weight assignment (for example, 1, 2 or n) of the in silico classifier by anchoring 1, 2 or n signal moieties on a PAN reporter. Then, the signal gain from each target molecule would be linearly proportional to the number of signal moieties on the PAN reporter, which enables one to weigh each target molecule according to its importance in the in silico classifier.

To fabricate DTF-based PAN reporters, we first assembled a DTF containing a handle DNA on a vertex by mixing seven DNA fragments of 58 nucleotides (58-nt) and one handle DNA-containing DNA fragment of 81-nt in stoichiometric equivalents in buffer (Supplementary Fig. 3). We heated the mixture to 95 °C and then rapidly cooled it to 4 °C. The DTFs were assembled with a high yield of ~95%, characterized by atomic force microscopy (AFM; Supplementary Fig. 4) and polyacrylamide gel electrophoresis (PAGE; Supplementary Fig. 5). We measured a typical edge length of ~12 nm for DTFs (37 base pairs for each edge), which was consistent with its theoretical length31. To form the PAN reporter containing more anchoring sites of signal moieties, we coupled one DTF to another DTF to form DTF dimer structures through the hybridization of a linker DNA and the handle DNA in the two DTFs. The DTF dimer we formed had a dumbbell-shaped structure (with ~95% yield), as shown by AFM and PAGE imaging (Fig. 2a and Supplementary Figs. 5 and 6).

Fig. 2: The design of valence-encoded signal reporters using PANs.
figure 2

a, In the multidimensional molecular classifier, a signal reporter was generated to translate each molecule target into a unified output signal and linearly programmed signal gain. We employed a DTF-based PAN reporter to implement the information translation (left). Wm, Wmi, Wp and Ws were defined as the weight for different biomarkers. The structure of the PAN reporter was confirmed with an AFM image (right, top). The height–length measurement of the PAN reporter (right, bottom) indicates the two DTFs were successfully coupled through the hybridization of the linker DNA and handle DNA. Scale bar, 10 nm. The AFM experiments were repeated three times, and one representative image is shown. b, Fluorescence mapping (left) and intensity (right) of the PAN reporter with different numbers (n) of Cy3. The center schematics show the structures. Scale bar, 2 mm. c, Correlation between the number and fluorescence intensity of each defined number of Cy3 dyes (shown in the diagram along the top) on the PAN reporter. The images of fluorescence dots were shown on the top. Scale bar, 500 nm. Error bars indicate standard deviations (mean ± s.d., n = 20). d, Six steps of photobleaching traces (red arrows) were observed when the number of Cy3 dyes was six on a single PAN reporter (inset schematic). The images of fluorescence dots modified with different number of dyes were shown on the top. Scale bar, 500 nm. e, Correlation between the number and fluorescence intensity of each controlled number, with two types of fluorophores, Alexa Fluor 488 and Cy5, on the PAN reporter (shown on the diagrams in the center). Error bars indicate standard deviations (mean ± s.d., n = 20). The images of fluorescence dots modified with different number of dyes were shown on the top (Alexa Flour 488) and bottom (Cy5). Scale bars, 500 nm. f, TEM images illustrate the precisely controlled number of AuNPs on the PAN reporter, connected with white dashed lines and also shown in the schematics at the top. Scale bar, 10 nm. g, The formation of the PAN reporter anchored with different numbers of HRP and confirmed with AFM. Scale bar, 100 nm.

To validate the valence-encoded PAN reporters, which may encode PANs with a defined number of signal moieties, we employed fluorophore labels (for example, cyanine-3 (Cy3)) as signal moieties on PAN reporters and characterized the precise number of signal moieties on the PAN reporters via the single-molecule technique, total internal reflection fluorescence microscopy (TIRFM). PAN reporters containing a defined number of signal moieties (n = 1, 2 or n) were realized by anchoring 1, 2 or n fluorophores on the vertices of a DTF dimer. We observed that the fluorescent intensity of the PAN reporters in bulk solution was linearly proportional to the number of the signal moieties (R2 > 0.986, R2, R-squared; Fig. 2b). Moreover, aggregation-caused quenching cannot occur due to the separation of the fluorophores caused by the edge length of ~12 nm of the DTF32 (Supplementary Fig. 7). Similarly, the fluorescence intensity of a single PAN reporter increased linearly with the increase of the number of Cy3 from one to six (R2 > 0.998) in TIRFM measurements (Fig. 2c and Supplementary Fig. 8a). Moreover, we observed stepwise single-molecule fluorescence photobleaching33, as six steps of photobleaching trace the PAN reporter containing six Cy3 labels. One to five steps of photobleaching trace the PAN reporters containing one to five Cy3 labels (Fig. 2d and Supplementary Figs. 8b and 9). Thus, the numbers of the signal moieties on PAN reporters were precisely controlled from one to six.

We next asked whether PAN reporters possess the orthogonality to accommodate programmed multicolour reporters. We anchored two types of fluorophores on PAN reporters with different emissions but without fluorescence resonance energy transfer (Supplementary Fig. 10) on the PAN reporters. To this end, six fluorophores were anchored on single PAN reporters with various number combinations of Alexa Fluor 488 and Cy5. The fluorescence intensity and steps of photobleaching of the PAN reporters were linearly proportional to the numbers of each type of fluorophore, without interference with each other (Fig. 2e and Supplementary Fig. 11). For example, when we anchored one Alexa Fluor 488 fluorophore and five Cy5 fluorophores on a single PAN reporter, we observed one step of a photobleaching trace of Alexa Fluor 488 and five steps of a photobleaching trace of Cy5. Thus, the anchoring sites of the PAN reporter were individually controlled, with a defined number of signal moieties even in the presence of multiple distinct signal moieties.

To demonstrate the generality in labelling multiple types of signal moieties on the PAN reporter, we anchored various signalling moieties, including gold nanoparticles (AuNPs; usually used as a signal moiety for mass or colorimetric output)34,35,36 and enzymes (usually used as a signal moiety for fluorescent, colorimetric or electrocatalytic output)37,38,39,40. We visualized the spatial structure of the PAN reporter anchored with AuNPs via transmission electron microscopy (TEM) with a precisely controlled number from one to six (Fig. 2f and Supplementary Fig. 12). Interestingly, the spatial arrangement of AuNPs coincided with the vertices’ arrangement on the DTF dimers, indicating that the signal moieties were well anchored on the PAN reporter. We then used horseradish peroxidase (HRP) as an example to anchor on the PAN reporter. The AFM images showed a precise number of HRPs from one to six on the PAN reporter, as shown in Fig. 2g and Supplementary Fig. 13.

Molecular implementation of weight assignment

An in silico classifier realizes data classification via the assignment of a numerical weight to each piece of data that represents its importance, and then summing the weighted result41. Analogously, a multidimensional molecular classifier translates each molecular input with a weighted sensing signal representing its importance by designing a valence-encoded PAN reporter to program the unified electrochemical sensing signal for the multidimensional molecules.

We developed the weighting system for multidimensional molecules with PAN reporters (Fig. 3a). The essential role of the system was to facilitate the binding event between the probe and target molecule to trigger a weighted electrochemical signal. We used DTFs to pattern recognition probes on the electrode surface according to our previous reports42, leading to a uniform biorecognition layer. We employed a sandwich configuration to translate the molecular binding event into the recruitment of the PAN reporter on the electrode for RNAs and proteins. For example, for RNAs (messenger RNA (mRNA) or miRNA), a single-stranded DNA probe was used as the recognition probe, where base-pairing interactions capture the target RNAs on the electrode surface (Supplementary Fig. 14a,b). The PAN reporter then specifically recognized the overhang portion of the probe–target complex and translated the presence of the target RNAs into a weighted electrochemical signal with HRP as the signalling molecule (Fig. 3b). For proteins, a specific monoclonal antibody was used to capture the target protein on the electrode. Another antibody was then used to form an antibody–protein–antibody sandwich for the target protein (Fig. 3b and Supplementary Fig. 14c). For small molecules, we used an aptamer–DNA duplex as the recognition probe. The small-molecule-to-aptamer binding triggered the release of DNA on the electrode surface, which recruits the PAN reporter via a hybridization between the released DNA and the DNA linker on the PAN reporter (Fig. 3b and Supplementary Fig. 14d). Thus, we designed the weighting system for all the major dimensions of biologically relevant molecules, indicating the generality of our PAN reporter for the weight assignment in multidimensional molecules (Fig. 3a,b).

Fig. 3: A PAN reporter-based weighting system for multidimensional molecules.
figure 3

a, Scheme of the weighting system for analysing multidimensional molecules on a gold electrode. b, Scheme illustration of the recognition and weighting for multidimensional molecules. The system facilitated a recognition binding event between the probe and target molecule, which triggered a weighted electrochemical signal. The top shows mRNA, miRNA or protein, while the bottom shows a small molecule. c, Electrochemical signals (shown as current versus time curves) of four types of target (miRNA, small molecule, mRNA and protein, illustrated in the insets) with precise weight assignment (W). The long axis is an isometric zoom of the short axis to show the values more clearly. d, Generality of our weighting system for multiple biomarkers. e, Validation of two-dimensional molecular classifier. The PSA and MEIS2 gene were used as biomarkers to verify the multidimensional molecular classifier. Left: the individual PSA or MEIS2 gene analysis. Right: the classification based on the linear classifier model for PSA and MEIS2.

We experimentally implemented this weighting system by designing a weight assignment with one to six HRPs using a PAN reporter for multidimensional molecules (for example, miRNA, mRNA, proteins and small molecules). The electrochemical signal corresponding to the weight assignments was recorded after the addition of the targets until a steady electrochemical signal was achieved. We observed that the signals were linearly proportional to the weights that were realized through controlling the number of HRP on the PAN reporter (R2 > 0.997) for an RNA of 78-nt, a miRNA of 22-nt, an antigen of ~30,000 daltons and a small molecule with 13 atoms. Thus, this system was suitable for assigning an integer-valued weight to different targets (Fig. 3c).

To further demonstrate the generality of the design, we applied the weighting system to 12 additional biomarkers, including COVID-19 biomarkers (including Open Read Framework 1ab (ORF 1ab), envelope gene (E gene) and nucleus gene (N gene))43; cancer biomarkers (mRNA ROR2, mRNA MEIS2 and circulating tumor DNA ALU115)44; and disease-related miRNAs (miR-21, miR-26a, miR-375, miR-144, miR-153 and miR-183)45. We achieved a signal gain of 3.35 μA for ORF 1ab at a concentration of 1,000 copies μl−1 (~1.66 fM), indicating successful signal translation (Fig. 3d). Analogously, we observed remarkable signal gains of 3.75 μA for ALU115 with a concentration of 1 fM.

We further explored the implementation of the weighting system in complicated and biologically relevant matrices, including four types of different diluent of human body fluids (sweat, serum, urine and saliva) and five types of mouse tissue homogenates (heart, kidney, lung, stomach and liver). We observed efficient signal translation and achieved a remarkable signal gain for target molecules, so our weighting system was suitable for complicated biological samples (Supplementary Figs. 15 and 16).

Validation of the two-dimensional molecular classifier

To experimentally validate a two-dimensional molecular classifier, we employed prostate-specific antigen (PSA), a biomarker in PCa diagnosis, and MEIS2, an mRNA biomarker related to PCa, as the target biomarkers (Fig. 3e)46. We assigned a positive weight of +3 to PSA and a negative weight of –3 or –1 to MEIS2. A positive weight represents the positive correlation and a negative weight represents the negative correlation to disease, while their values indicate their importance. We prepared 64 mimetic samples through mixing these two biomarkers with different concentration combinations (Supplementary Table 1) and measured these biomarkers using our PAN reporter (Fig. 3e, left). After analysing the data via a mathematical function (Result= 3CPSA – 3CMEIS2; C, concentration), we found that the 64 samples were classified into two groups, in agreement with our classifier design (Fig. 3e, right). Moreover, when we changed the weight of MEIS2 from −3 to −1 via a mathematical function (Result= 3CPSA – 1CMEIS2), those samples were also classified into two groups but with a different thresholding boundary compared with Result= 3CPSA – 3CMEIS2 (Supplementary Fig. 17).

In silico training for PCa diagnosis

Next, we attempted to scale up our molecular classifier and employ multidimensional data to classify PCa patients. The workflow is illustrated in Supplementary Fig. 18. To obtain an in silico classifier model for PCa patients’ classification, we used publicly available gene and miRNA profiling data from Gene Expression Omnibus, as well as PSA and sarcosine measurement data from previous works47, for classifier training (Fig. 4a). We analysed the distributions of the multidimensional molecules between the healthy individuals and PCa patients, and the selected molecules were distinguishable between these two groups (Supplementary Figs. 1922). We further investigated the classification models with our classifier, and the robust validation capabilities were confirmed (Supplementary Figs. 2325).

Fig. 4: In silico training of linear molecular classifier to discriminate PCa patients and healthy individuals.
figure 4

a, Left: scheme of computational training using publicly available multidimensional molecular information. Right: the expression (Exp.) information of the multidimensional molecular was used for the computational training. b, Selected combinations and their associated weights as a linear classifier model. c, Confusion matrix analysis of 85 samples with a multidimensional classifier. According to the calculation of the analysis data, we obtained a precise score of 100% with the linear classifier model. d,e, Standard deviation analysis of training (d) and validation sets (e) indicate the performance of the multidimensional classifier for the diagnosis of PCa.

We integrated the three datasets into a large dataset to evaluate the application for multidimensional molecules and searched the weight combinations by using several logistic regression models with different optimized emphases (Supplementary Fig. 26). We then selected the precision-optimized model to avoid overtreatment (Fig. 4b,c). The optimal weights obtained included miR-153 (weight = –1), miR-183 (weight = +4), ROR2 (weight = –2), MEIS2 (weight = –3), PSA (weight = +3) and sarcosine (SO; weight = +1). With this set of weights, we achieved a recognition sensitivity of 80%, specificity of 100%, F1-score of 97%, receiver operating characteristic (ROC) curve of 97%, precision of 100% and accuracy of 95% for the validation set (Fig. 4c and Supplementary Fig. 26c; the parameters are presented in Supplementary Table 2). Further, we compared the training and validation sets using standard deviation analysis of the multidimensional targets for PCa diagnosis (Fig. 4d,e). The classifier showed excellent specificity and sensitivity, and it was feasible to achieve molecular implementation.

PCa diagnosis using multidimensional molecular classifier

We first validated the signal-translating performance of the PAN reporter for six biomarkers of PCa. The electrochemical signal of miRNA exhibited a concentration-dependent linear response with a dynamic range of four orders of magnitude. The detection limit for miRNAs was estimated as 100 fM, allowing for the direct analysis of miRNAs for real samples48 (Supplementary Fig. 27). Similarly, we achieved the sensitive detection of mRNA, PSA and SO with dynamic ranges of three to five orders of magnitude. The detection limits were down to 1 pM for mRNA, 0.05 ng ml–1 for PSA and 10 nM for SO (Supplementary Figs. 2830). The electrochemical signals were also positively correlated to the weights for each biomarker, in agreement with the trends in Fig. 3c. Thus, we successfully established the weight assignment for the six biomarkers (miR-153, miR-183, ROR2, MEIS2, PSA and SO; Supplementary Figs. 3134).

We then implemented the molecular classifier for the classification of real clinical samples from 32 PCa patients and 50 healthy individuals (the sample information is summarized in Supplementary Table 3). The workflow for clinical sample classification is presented in Supplementary Fig. 35. As shown in Fig. 5a,b, we successfully employed the PAN reporter to convert the six biomarkers into weighted electrochemical signals using the optimized weight sets (Fig. 5c). We realized an accurate classification between PCa patients and healthy individuals with our molecular classifier (P value < 0.01; Fig. 5d). The ROC curve indicated a high predictive power with an area under the curve (AUC) of 100% using our molecular classifier (Fig. 5d). We obtained a specificity of 100% and sensitivity of 100%, with the optimal cut-off value. By contrast, we obtained an AUC of only 54% with a single miRNA (miR-183) and an AUC of 84% with a single mRNA (ROR2; Fig. 5e and Supplementary Fig. 36).

Fig. 5: Multidimensional molecular classifier for PCa diagnosis.
figure 5

a, Mean distribution of six biomarkers in samples from 32 PCa patients and 50 healthy individuals. Error bars indicate standard deviations from n = 32 for PCa group and n = 50 for healthy group (mean ± s.d.). The grey boxplots are for healthy individuals and the coloured boxplots are for PCa patients. b, Signal translation of six biomarkers’ content information with weights in clinical samples from PCa patients and healthy individuals. The grey means the signal translation of six biomarkers’ content information of healthy individuals. And the coloured data means the signal translation of six biomarkers’ content information of PCa patients. The red dashed line is used to demarcate the data between healthy people and cancer patients. c, Scheme of precise PCa diagnosis using a multidimensional molecular classifier. The Fs is defined as a symbol for classification of PCa patients and healthy individuals. d, Left: diagnosis results of the 82 samples using a multidimensional molecular classifier. The black dashed line is used to demarcate the data between healthy people and cancer patients. Right: ROC analysis for the discrimination between PCa patients and healthy individuals using a multidimensional molecular classifier. e, ROC analysis for the discrimination between PCa patients and healthy individuals using a single-dimensional molecule analysis. In d and e, ‘100%–specificity %’ is the x-axis legend of ROC curve. In e, AUC is defined as the area under the ROC curve. The red dashed line is the boundary for calculating the area, which is common to both ROC curves (including blue and orange).

Biomarker panel screening using molecular classifier

Biomarker panels have the potential to distinguish between patients in various disease processes49 (for example, patients with various Gleason scores for PCa). The rational design of biomarker panels with optimal weighting more accurately reflects the multiple disease processes of cancer. However, the screening of the optimal weighting of each biomarker is challenging. We used serum samples from 12 patients to screen the optimal weighting of the biomarker panel. Samples included four samples with a Gleason score of 6, four samples with a Gleason score of 7 and four samples with a Gleason score of 8 or 9. We used a panel of miRNAs (miR-32, miR-96, miR-153, miR-183) as a model system and assigned weights 1, 2, 3 and 4 to each miRNA using our PAN reporter’s weighting system. The weighted signals from the miRNAs with different weight combinations were obtained as 2,048 combinations. The results were used for clustering analysis to screen the optimal weighting set of the biomarker panel (Fig. 6a,b). As shown in Fig. 6c, top five correlation analysis allowed for the classification of three groups according to the Gleason scores, with the optimal weighted result given as Result = 3CmiR-32 – CmiR-96 + CmiR-153 – 2CmiR-183, as shown in Fig. 6d, indicating the ability of our molecular classifier to perform the biomarker panel screening.

Fig. 6: Diagnosis panel screening using PAN reporters for PCa.
figure 6

a, Schematic illustration of diagnosis panel screening using PAN reporters. We tested different biomarkers (miR-32, miR-96, miR-153 and miR-183, labelled A–D) in a series of clinical samples with different scores. Several weights (1, 2, 3 and 4) were assigned to the different markers for information transducing, and all combinations of weights were traversed to finally obtain the optimal diagnostic panel. b, The heatmap of the total traversal analysis for 2,048 ( = 84/2). c, The heatmap of the top five correlation analysis. d, The clustering analysis with the optimal weight combination (miR-32 with weight of +3, miR-96 with weight of –1, miR-153 with weight of +1 and miR-183 with weight of –2). The red dashed line represents the average of the group with different Gleason scores. Significant clustering related to the Gleason score was observed.

Conclusions

In summary, we developed valence-encoded PAN signal reporters by exploiting DNA frameworks to realize multidimensional molecular classification, which resulted in precise PCa diagnosis (an AUC of 100%) with six biomarkers across three-dimensional datatypes (Supplementary Information). Given the ever-increasing amount of molecular information from the gene, RNA, protein and metabolomic profiling of diseases, our multidimensional molecular classifiers for analysing multidimensional molecular biomarkers sheds light on precision diagnosis and therapy.

Methods

The study was approved by the Ethics Committee at Renji Hospital, School of Medicine, Shanghai Jiao Tong University. All methods were performed in accordance with these approved guidelines.

Workflow

The workflow for the classification of real clinical samples is presented in Supplementary Fig. 35. Recognition probes for each target were first modified on the electrode. The read-out of the electrochemical signal of the multidimensional target was performed by weighting the capture of the recognition probe with the PAN reporter. The final classification of clinical samples was achieved by a diagnostic function. The cost for a patient is only US$6.3 (Supplementary Table 4)

Data availability and simulations

The miRNA data

The miRNA data for PCa patient analysis was from GPL8227 (Agilent-019118 Human miRNA Microarray 2.0 G4470B; miRNA ID version). This dataset included 113 prostate patients and 28 healthy individuals. For every single person, there were 881 miRNA described, such as miR-183. According to the t-test result, 171 miRNA described were selected with a high significant difference between patients and healthy ones. Tree-based feature selection from sklearn (the function library of tree-based feature) was used to select the top related miRNAs (miR-183 and miR-153).

The mRNA data

The mRNA data for the PCa patient analysis was from GPL10264 (Human Exon 1.0 ST Array; CDF file, HuEx_1_0_st_v2_main_A20071112_EP.cdf) and recorded the Affymetrix gene expression of 150 PCa patients and 29 healthy individuals. The descriptors were dimensionality-reduced from 43,419 to 6,148 by a t-test, and remained two items (NM_170675 (MEIS2) and NM_004560 (ROR2)) by tree-based feature selection.

Clinical dataset

The clinical dataset was from the literature47. It contained 70 PCa patients and 32 healthy individuals. The most important features were the PSA and SO after being treated similarly, as mentioned earlier.

From gene expression data (GPL10264), we obtained 150 PCa patients and 29 healthy individuals and employed a tree-based feature selection method to screen for the two most related aberrant expressed genes. We selected ROR2 and MEIS2 from 43,419 items (dataset 1). Similarly, we analysed the miRNA profiling (GPL8227) with 113 PCa patients and 28 healthy individuals, and selected two important miRNAs (miR-153 and miR-183) by feature selection (dataset 2). In addition, PSA and SO were selected as protein and small molecule biomarkers, respectively, from the clinical data (70 PCa patients and 32 healthy individuals; dataset 3).

The missing data were replaced by the average of each descriptor among the same group. In all, the combined dataset had 422 samples; among these were 333 PCa patients and 89 healthy individuals. Each individual was described by six selected descriptors.

Software

Tree-based feature selection from sklearn was used for feature selection, and the logistic regression module from sklearn was applied to classify the two-category model. To find the integer weights of each descriptor, an exhaustive search method hunted through the whole integer parameter space from –4 to 4. The accuracy, precision, recall and F1-score of every model were calculated and recorded. The classification analysis was implemented by the Classification learning app in MATLAB (R2020b).

Benchmarking

The concentration and weight correlation of the molecular classifier were calibrated with the standard samples for different targets before diagnosis applications. The concentration of the standard was quantified by the UV absorbance at 260 nm by the Shanghai Institute of Measurement and Testing Technology. (The certificate of the standard samples is provided in Supplementary Information and Supplementary Table 5.)

Synthesis and purification of DTF-based PAN reporter

All DNA strands were mixed in TM buffer to synthesize the DTF structures (the proportions are illustrated in Supplementary Tables 619). The mixture was heated to 95 °C for 15 min, and cooled to 4 °C for at least 20 min by using a PTC-200 thermal cycler DNA engine (MJ Research, USA). We purified the synthesized DTF structures according to the method reported in the literature50. Our PAN reporter is simple to prepare and can be successfully synthesized even by undergraduate students without any knowledge in this field (Supplementary Fig. 37 and Supplementary Table 20). Moreover, we were able to achieve millilitre-level (7.5 ml) synthesis using a metal blocker for the bulk preparation of PAN (Supplementary Fig. 38). The PAN was tested and characterized through PAGE after being stored in buffer solution or serum for 1, 3, 7 and 15 days. As shown in Supplementary Fig. 39, PAN remained stable in the buffer solution even after 15 days and stable in the serum for at least a day. Thus, PAN reporters can be prepared in bulk and preserved for long periods, with potential for practical clinical applications (Supplementary Table 21). The stability of the DTF at the interface was also examined, to adapt it to interfacial applications. After being modified with both Cy3 and Cy5 on the same edge of the DTF, we found that the DTF can be stable at the interface for up to five days, as determined with fluorescence resonance energy transfer and dual fluorescence co-localization (Supplementary Fig. 40).

Weighting system for miRNA information translation

The purified DTF for short-strand RNA interface capture (1 μM, 6 μl) was incubated on the cleaned electrode overnight at room temperature. The electrodes were then passivated by methylcyclohexane (2 mM), polyethylene glycol 2000 (2 mM) and 10% bovine serum albumin. After that, the electrodes were washed with phosphate-buffered saline (PBS) and dried with nitrogen. Next, the samples were dropped on the electrode surface and incubated for 2 h at 25 °C. The PAN reporter (50 nM) was added on the electrode surface and incubated for 2 h at 25 °C. Finally, 4 μl of avidin-HRP was added on the electrode surface for 15 min to bind to the biotin in the molecular reporter. After being washed thoroughly, the electrodes were immersed in TMB solution buffer for electrochemical measurements.

Weighting system for mRNA information translation

The purified DTF dimer for long-strand RNA interface capture (1 μM, 6 μl) was incubated on the cleaned electrode overnight at room temperature. The processes of sealing and content information transformation for mRNA were the same as those of miRNA.

Weighting system for PSA information translation

The purified DTF for PSA interface capture (1 μM, 6 μl) was incubated on the cleaned electrode overnight at room temperature. After being washing twice with PBST buffer and once with PBS buffer, anti-PSA monoclonal antibody (coating; monoclonal antibody is a highly uniform antibody and only specific to a specific epitope) (L1; 100 μM, 6 μl) was dropped on the chip electrode and incubated at room temperature for 2 h to form the fixed probe, and then the electrode was washed twice with PBST buffer and once with PBS buffer. Subsequently, a series of PSA samples in PBS buffer (6 μl) at variable concentrations were dropped on the chip electrodes and incubated at 37 °C for 1 h. After washing, anti-PSA monoclonal antibody (labelling) (L2; 100 μM, 6 μl) was dropped on the chip electrode and incubated at room temperature for 2 h to form the capture probe. The chip electrode was washed twice with PBST buffer and once with PBS buffer. After that, the PAN reporter (100 μM, 6 μl) was dropped on the chip electrode and incubated at room temperature for 2 h to form the weighting probe. Finally, excess avidin-HRP was dropped onto the electrode and incubated at room temperature for 15 min. After washing twice with PBST buffer and once with PBS buffer, electrochemical testing was performed immediately. The sequences of L1 and L2 were shown in Supplementary Tables.

Weighting system for SO information translation

The purified DTF for SO interface capture (1 μM, 6 μl) was incubated on the cleaned electrode overnight at room temperature. The electrodes were passivated with 0.13% methylcyclohexane, 20 mg ml–1 polyethylene glycol and 1% casein in sequence for 1 h. The diluted sample solution was incubated on the 16-channel electrodes for 2 h at room temperature (6 μl). After 2 h, the 16-channel electrodes were washed with the washing buffer. The PAN reporter was incubated on the electrodes for 2 h and then washed with PBS buffer. Finally, excess avidin-HRP was dropped onto the electrode and incubated at room temperature for 15 min. After washing twice with PBST buffer and once with PBS buffer, electrochemical testing was performed immediately.

Electrochemical measurements

All electrochemical measurements were done on a Model 1040C (CH Instruments). The working gold 16-channel electrode, the auxiliary electrode and the reference electrode, integrated in the chip, were used. Cyclic voltammetry was carried out at a scan rate of 100 mV s–1. The current was recorded at –100 mV after the steady state of the HRP catalytic reaction was reached30.

Biomarker panel screening using molecular classifier

In the experiments of the biomarker panel screening, we used the fluorescent signal chip system. The 500 nM miRNA capture probes (Supplementary Tables 16 and 17) were printed by microarray robot (Nano-Plotter NP2.1). After incubating overnight, the chip was then blocked by 2 mM polyethylene glycol 2000 for 45 min and 2% bovine serum albumin for 1 h. The diluent for clinical samples was added on the chip and incubated for 2 h at 25 °C. The PAN reporter with different weights (50 nM) was then added on the chip and incubated for 2 h at 25 °C. Then the chip was imaged by a GenePix 4100A microarray scanner. We obtained signals of four targets with four weights for 12 patients. By adding and subtracting combinations of them, 2,048 (84/2) diagnostic formulas were obtained. Finally, the optimal formulas were filtered by cluster analysis.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.