Superhydrophobic lab-on-chip measures secretome protonation state and provides a personalized risk assessment of sporadic tumour

Secretome of primary cultures is an accessible source of biological markers compared to more complex and less decipherable mixtures such as serum or plasma. The protonation state (PS) of secretome reflects the metabolism of cells and can be used for cancer early detection. Here, we demonstrate a superhydrophobic organic electrochemical device that measures PS in a drop of secretome derived from liquid biopsies. Using data from the sensor and principal component analysis (PCA), we developed algorithms able to efficiently discriminate tumour patients from non-tumour patients. We then validated the results using mass spectrometry and biochemical analysis of samples. For the 36 patients across three independent cohorts, the method identified tumour patients with high sensitivity and identification as high as 100% (no false positives) with declared subjects at-risk, for sporadic cancer onset, by intermediate values of PS. This assay could impact on cancer risk management, individual’s diagnosis and/or help clarify risk in healthy populations.

the mean) and the mean fluorescence intensity (MFI) quantified after a titration curve optimized for each antibody used in the panel. PCA-maps displaying 5000 cells in patients and control BDCs (Supporting Figure S1d). Events corresponding to circulating cancer cells were grouped in P2-8 clusters and in P1 were grouped events corresponding to circulating non haematological cells. Each cluster is coloured according to their normalized markers expression on PCA-maps.
The endothelial phenotype CD45 neg CD146 pos was presented in a mean of 4,39± 0,6834 cells particularly in NSCLC and glioblastoma (r=0,36). CD45 neg Pan-CK pos and CD45 neg CD326 pos phenotype was recognized in a mean of 232,6 and of 40,5 cells prevalently in breast, colon, lung and thyroid tumours (r=-0,5). CD45 neg Vimentin pos and CD45 neg Fibronectin pos markers was found in a mean of 101,4 and 101 cells with a prevalence of vimentin in melanoma and glioblastoma (r=0,5) and fibronectin in breast cancer (r=0,4). Cancer stem like phenotype, CD45 neg CD44 pos in a mean of 2,70,4 and CD45 neg CD133 pos in 20,3 cells and their expression increased with the grade (r=0,4). Epithelial mesenchymal transition phenotype CD45 neg Pan-Ck pos Fibronectin pos was found in 5,61,1 cells increasing with grade and stage (r=0,5 and r=0,6). Supporting Information 2 Clustering analysis of a second cohort of 9 samples A cohort of 9 subjects signed by ** in Data file S1 were analysed by SeOCET. This cohort were composed by 5 subjects affected by non-cancerous inflammatory disease and 4 healthy subjects. Data suggested that the cultivated cells isolated from the liquid biopsy performed in subjects with no inflammation displayed a higher Ps rather than the cancer patients and were grouped within the subset of control samples.

Supporting Information 3 Statistical analysis of SeOECT data
Principal Components Analysis of SeOECT data. On the basis of ANOVA results, PCA was carried out both for modulation and time constant outputs, including independents outputs from the different five sensors and excluding from the dataset the measurements performed at Vgate values non significantly associated to the "label" of the samples, in order to avoid the introduction of "noise" in the data modelling procedure. Also in this case only "C" (control) and "P" (patients) samples were included in the analysis. PCA performed on modulation outputs acquired at V3, V4 and V5 Vgate values gave good result in terms of PC extraction, as the eigenvalues resulted >1 for the first 3 components, with a cumulative explained variance of 95.7 % (Supporting Table S3.1, S3.2). PCA performed on tau outputs acquired at V4 and V5 Vgate values also gave good result in terms of PC extraction, as the eigenvalues resulted >1 for the first 2 components, with a cumulative explained variance of 92.4 % ( Table S4). The weights of the single variables on the extracted components were used in order to select the "best six" experimental outputs, in terms of discrimination capability among C and P values. A matrix scatterplot of the modulation outputs extracted from PC1 (Supporting Figure S3a

Supporting Information 4. The clustering algorithm
We partitioned elements into groups using a density based clustering algorithm 18 . The algorithm classifies elements into categories on the basis of their similarity. Cluster centers are determined as those points in the set with higher density than their neighbors and by a relatively large distance from points with higher densities. To do so, per each point in the set: (i) we determine its density ( ) as the number of points that are closer than a cut off distance to ; (ii) find the subset of points in the dataset with densities ( ) > ( ); (iii) find the point with minimum distance to , this distance is ( ): the minimum distance of from points with higher densities than .
After operations from (i) to (iii), we derive a diagram where the density is reported against ( ) per each element in the data set. Points in the set with higher density than their neighbors and by a relatively large distance from points with higher densities emerge as singularities in the diagram, an example of which is reported in Supporting figure S4.1. These points are the cluster centers. Each point in the set is attributed to different clusters on the basis a minimum distance criterion: a point is attributed to a cluster if the minimum distance of to is the smaller among all the minimum distances calculated with the remaining clusters. Thus clusters are constructed per accumulation. The cluster centers represent the seeds of the clusters.
Here, we found that the totality of tumor and non-tumor samples is partitioned in the modulation/time plane into two separate groupswith all tumor samples gathered in the same cluster (say ), and all non-tumor samples gathered in the other clusters (say ). This is relevant because unsupervised clustering (i) finds, without prior knowledge, that there are two groups with some internal correlation in the data set (reflecting the initial number of sample categories, i.e. tumor and non-tumor) and (ii) associates all elements of a category to the same cluster (revealing that clusters have internal consistency and that clustering reflects physical differences between categories). Suppose now to have and (actually, we have andthey are the distributions of measured tumor and non tumor samples as in Figures 3f and 5 of the main text). Then, we have an additional measure of the unknown (undetermined) sample . Suppose that is a tumor sample. If falls within the convex closure of , then the algorithm would assign to the correct cluster with 100% confidence. If falls outside the convex closure of , we can assign to with probability and to with probability = 1 − . The closer to the border of , the greater . We can calculate and on a statistical basis.
We assume is drawn by a Gaussian distribution with standard deviation ( ) and mean = ̅ ( ) ± ( ) √ ⁄ ( is the size of , is the mean of the population, ̅ is the mean of sample , ( ) √ ⁄ is the standard error of the mean is the score associated to specific confidence intervals).
We generate a large number of tentative ′ ( > 1000 Supporting Figure S4.2a). Then, we examine whether falls in the first ( → ) or in the second ( → ) group. is determined as the number of → events to . Supporting Figure S4.2b reports as a function of the size of sample for different values of the confidence interval . The method assigns the unknown sample to the correct cluster with 100% reliability ( = 1) and 0% uncertainty ( = 0) for any initial sample with size > 2 and fixed confidence interval < 98% ( = 2.33). Fixing the confidence interval to = 99.99% ( = 3.29), the size of necessary for reaching 100% increases to ~15. Thus, for any sufficiently large initial tumor set, the method would diagnosis unknown, potentially tumor samples deterministically, i.e. with = 0. The analysis, here reported for the couple of variable 3 − 5 , can be performed for any combination of modulation, time constant, and sensor number that maximize the system response, resulting from PCA postprocessing of data.  Figure S5, measured at the sensor number S2 and voltage V4 (a), sensor number S3 and voltage V5 (b), sensor number S5 and voltage V4 (c). Clustering of data using non Euclidean metrics as explained in the main text, enables data classification of the sole patient (tumor) and control (healthy) subjects with 100% performance for case (a) and (b), and 87% performance for case (c).