Serum biomarkers are often insufficiently sensitive or specific to facilitate cancer screening or diagnostic testing. In ovarian cancer, the few established serum biomarkers are highly specific, yet insufficiently sensitive to detect early-stage disease and to impact the mortality rates of patients with this cancer. Here we show that a ‘disease fingerprint’ acquired via machine learning from the spectra of near-infrared fluorescence emissions of an array of carbon nanotubes functionalized with quantum defects detects high-grade serous ovarian carcinoma in serum samples from symptomatic individuals with 87% sensitivity at 98% specificity (compared with 84% sensitivity at 98% specificity for the current best clinical screening test, which uses measurements of cancer antigen 125 and transvaginal ultrasonography). We used 269 serum samples to train and validate several machine-learning classifiers for the discrimination of patients with ovarian cancer from those with other diseases and from healthy individuals. The predictive values of the best classifier could not be attained via known protein biomarkers, suggesting that the array of nanotube sensors responds to unidentified serum biomarkers.
This is a preview of subscription content, access via your institution
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The main data supporting the results in this study are available within the paper and its Supplementary Information. Source data for the figures are provided with this paper. The raw datasets generated during the study are too large to be publicly shared, yet they are available for research purposes from the corresponding author on reasonable request. Source data are provided with this paper.
The custom Python and MATLAB codes for the machine learning and the data analyses reported in this study are not yet publicly available owing to intellectual-property-filing issues, yet they are available for research purposes from the corresponding author on reasonable request.
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).
Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2020. CA Cancer J. Clin. 70, 7–30 (2020).
Blyuss, O. et al. Comparison of longitudinal CA125 algorithms as a first-line screen for ovarian cancer in the general population. Clin. Cancer Res. 24, 4726 (2018).
Cramer, D. W. et al. Ovarian cancer biomarker performance in prostate, lung, colorectal, and ovarian cancer screening trial specimens. Cancer Prev. Res. 4, 365 (2011).
Dupont, J. et al. Early detection and prognosis of ovarian cancer using serum YKL-40. J. Clin. Oncol. 22, 3330–3339 (2004).
Han, C. et al. A novel multiple biomarker panel for the early detection of high-grade serous ovarian carcinoma. Gynecol. Oncol. 149, 585–591 (2018).
Hertlein, L. et al. Human epididymis protein 4 (HE4) in benign and malignant diseases. Clin. Chem. Lab. Med. 50, 2181–2188 (2012).
Pinsky, P. F. et al. Extended mortality results for ovarian cancer screening in the PLCO trial with median 15 years follow-up. Gynecol. Oncol. 143, 270–275 (2016).
Jacobs, I. J. et al. Ovarian cancer screening and mortality in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS): a randomised controlled trial. Lancet 387, 945–956 (2016).
Menon, U. et al. Ovarian cancer population screening and mortality after long-term follow-up in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS): a randomised controlled trial. Lancet 397, 2182–2193 (2021).
Diamandis, E. P. The failure of protein cancer biomarkers to reach the clinic: why, and what can be done to address the problem? BMC Med. 10, 87 (2012).
Su, C.-Y., Menuz, K. & Carlson, J. R. Olfactory perception: receptors, cells, and circuits. Cell 139, 45–59 (2009).
Liu, M. C. et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 31, 745–759 (2020).
Hao, Y. et al. Detection of volatile organic compounds in breath as markers of lung cancer using a novel electronic nose. Proc. IEEE Sens. 2, 1333–1337 (2003).
Zhang, J. et al. Nondestructive tissue analysis for ex vivo and in vivo cancer diagnosis using a handheld mass spectrometry system. Sci. Transl. Med. 9, eaan3968 (2017).
Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1, 18 (2018).
Bachilo, S. M. et al. Structure-assigned optical spectra of single-walled carbon nanotubes. Science 298, 2361 (2002).
Cognet, L. et al. Stepwise quenching of exciton fluorescence in carbon nanotubes by single-molecule reactions. Science 316, 1465 (2007).
Heller, D. A. et al. Optical detection of DNA conformational polymorphism on single-walled carbon nanotubes. Science 311, 508 (2006).
Jena, P. V. et al. A carbon nanotube optical reporter maps endolysosomal lipid flux. ACS Nano 11, 10689–10703 (2017).
Heller, D. A. et al. Multimodal optical sensing and analyte specificity using single-walled carbon nanotubes. Nat. Nanotech. 4, 114–120 (2009).
Roxbury, D., Jena, P. V., Shamay, Y., Horoszko, C. P. & Heller, D. A. Cell membrane proteins modulate the carbon nanotube optical bandgap via surface charge accumulation. ACS Nano 10, 499–506 (2016).
Williams, R. M. et al. Noninvasive ovarian cancer biomarker detection via an optical nanosensor implant. Sci. Adv. 4, eaaq1090 (2018).
Roxbury, D., Mittal, J. & Jagota, A. Molecular-basis of single-walled carbon nanotube recognition by single-stranded DNA. Nano Lett. 12, 1464–1469 (2012).
Roxbury, D., Jagota, A. & Mittal, J. Structural characteristics of oligomeric DNA strands adsorbed onto single-walled carbon nanotubes. J. Phys. Chem. B 117, 132–140 (2013).
Roxbury, D., Tu, X., Zheng, M. & Jagota, A. Recognition ability of DNA for carbon nanotubes correlates with their binding affinity. Langmuir 27, 8282–8293 (2011).
Horoszko, C. P., Jena, P. V., Roxbury, D., Rotkin, S. V. & Heller, D. A. Optical voltammetry of polymer-encapsulated single-walled carbon nanotubes. J. Phys. Chem. C 123, 24200–24208 (2019).
Brozena, A. H., Kim, M., Powell, L. R. & Wang, Y. Controlling the optical properties of carbon nanotubes with organic colour-centre quantum defects. Nat. Rev. Chem. 3, 375–392 (2019).
Piao, Y. M. et al. Brightening of carbon nanotube photoluminescence through the incorporation of sp3 defects. Nat. Chem. 5, 840–845 (2013).
Kwon, H. et al. Optical probing of local pH and temperature in complex fluids with covalently functionalized, semiconducting carbon nanotubes. J. Phys. Chem. C 119, 3733–3739 (2015).
Luo, H.-B. et al. One-pot, large-scale synthesis of organic color center-tailored semiconducting carbon nanotubes. ACS Nano 13, 8417–8424 (2019).
Ao, G., Streit, J. K., Fagan, J. A. & Zheng, M. Differentiating left- and right-handed carbon nanotubes by DNA. J. Am. Chem. Soc. 138, 16677–16685 (2016).
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016).
Wolff, R. F. et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann. Intern. Med. 170, 51–58 (2019).
Moore, L. E. et al. Proteomic biomarkers in combination with CA 125 for detection of epithelial ovarian cancer using prediagnostic serum samples from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. Cancer 118, 91–100 (2012).
Pinals, R. L., Yang, D., Lui, A., Cao, W. & Landry, M. P. Corona exchange dynamics on carbon nanotubes by multiplexed fluorescence monitoring. J. Am. Chem. Soc. 142, 1254–1264 (2020).
Tenzer, S. et al. Rapid formation of plasma protein corona critically affects nanoparticle pathophysiology. Nat. Nanotech. 8, 772–781 (2013).
Heller, D. A. et al. Peptide secondary structure modulates single-walled carbon nanotube fluorescence as a chaperone sensor for nitroaromatics. Proc. Natl Acad. Sci. USA 108, 8544 (2011).
Wu, X., Kim, M., Qu, H. & Wang, Y. Single-defect spectroscopy in the shortwave infrared. Nat. Commun. 10, 2672 (2019).
Lee, M. A. et al. Can fish and cell phones teach us about our health? ACS Sens. 4, 2566–2570 (2019).
Zednik, C. Solving the Black Box Problem: a normative framework for explainable artificial intelligence. Philos. Technol. 34, 265–288 (2021).
Docter, D. et al. Quantitative profiling of the protein coronas that form around nanoparticles. Nat. Protoc. 9, 2030–2044 (2014).
Pinals, R. L. et al. Quantitative protein corona composition and dynamics on carbon nanotubes in biological environments. Angew. Chem. Int. Ed. 59, 23668–23677 (2020).
Lai, Z. W., Yan, Y., Caruso, F. & Nice, E. C. Emerging techniques in proteomics for probing nano–bio interactions. ACS Nano 6, 10438–10448 (2012).
Hadjidemetriou, M. et al. Nano-scavengers for blood biomarker discovery in ovarian carcinoma. Nano Today 34, 100901 (2020).
Zheng, M. & Diner, B. A. Solution redox chemistry of carbon nanotubes. J. Am. Chem. Soc. 126, 15490–15494 (2004).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
We thank B. Kwon, S. Chatterjee, A. Chatterjee, M. Fleisher, B. D. Davison, S. David and N. Osiroff for helpful discussions. This work was supported in part by NIH grants R01-CA215719, U54-CA137788, U54-CA132378 and P30-CA008748; the National Science Foundation CAREER Award (1752506); the Honorable Tina Brozman Foundation for Ovarian Cancer Research; the Tina Brozman Ovarian Cancer Research Consortium 2.0; the Kelly Auletta Fund for Ovarian Cancer Research; the American Cancer Society Research Scholar Grant (GC230452); the Pershing Square Sohn Cancer Research Alliance; the Expect Miracles Foundation – Financial Services Against Cancer; the Experimental Therapeutics Center; W. H. Goodwin and A. Goodwin and the Commonwealth Foundation for Cancer Research. M.K. was supported by the Marie-Josée Kravis Women in Science Endeavor Postdoctoral Fellowship. Y.H.W. gratefully acknowledges support from the National Science Foundation (CHE-1904488) and NIH grant (R01-GM114167). H.-B.L. acknowledges the support provided by the China Scholarships Council (CSC No. 201708320366) during his visit to the University of Maryland. P.W. gratefully acknowledges the Millard and Lee Alexander Fellowship from the University of Maryland. M.Z.’s work was NIST internally funded. Y.Y. was supported by a Dean’s Fellowship at Lehigh University. A.J. acknowledges the NHI initiative at Lehigh University.
D.A.H. is a co-founder and officer, with an equity interest, of Goldilocks Therapeutics Inc., Lime Therapeutics Inc. and Resident Diagnostics Inc., and is a member of the scientific advisory board of Concarlo Holdings LLC, Nanorobotics Inc. and Mediphage Bioceuticals Inc. The other authors declare no competing interests.
Peer review information
Nature Biomedical Engineering thanks Kanyi Pu, Steven Skates and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Spectral responses of OCC-DNAs to a small set of HGSOC and benign serum samples.
Four spectral parameters –intensity and wavelength changes of the E11 and E11- peaks– were extracted from fluorescence spectra of four serum samples in each group. Each sample was measured in triplicate. Horizontal lines denote the median. Six OCC-DNA nanosensors, with p-values of the spectroscopic features lower than 0.10, were selected for the sensor array.
Extended Data Fig. 2 Spectral responses of the nanosensor array to training and validation sets of patient serum samples (Nsa = 215).
Four spectral parameters, a, dint, b, dint*, c, dwl, and d, dwl*, were extracted from fluorescence spectra of the sensor array after 2-hour serum incubation. Each sample was measured in triplicate.
Extended Data Fig. 3 Averaged F-scores of optimized machine learning models with 10-fold validation.
The classification was divided as HGSOC versus other gynecologic diseases and benign groups. The blue line is the logarithmic regression of the median F-score.
a, Fraction of medication dose for HGSOC and other disease patients. b, Chronic conditions, and prevalence thereof, in patients measured in this study. Comorbidity was identified based on the patients’ medication information. c, Anti-cancer drugs or prescription drugs whose occurrence differed by 0.1 or higher between HGSOC and other disease groups.
a, CA125, b, HE4, and c, YKL40. The serum protein levels were quantified by automated immunoassay. Dotted lines indicate the clinical reference of each biomarker for HGSOC diagnosis. The error bars denote median ± 95% CI.
Extended Data Fig. 6 Response of OCC-DNA nanosensors to protein HGSOC biomarkers, creatinine, and bilirubin in 20% fetal bovine serum.
The fluorescence spectra were obtained 2 hours after the incubation. Vertical dashed lines indicate the clinical reference of each serum biomarker for HGSOC screening.
Extended Data Fig. 7 Relative feature importance of each spectroscopic variable in the HGSOC binary classification models.
a, Feature importance of each spectral parameter, used to train the SVM models, of all OCC-DNA sensors in the arrays tested in this work. Solid lines indicate the median feature importance. b, Correlation of averaged F-score with the averaged feature importance of each spectroscopic variable. Vertical dashed lines indicate F-score when all four spectroscopic variables (dint, dint*, dwl, and dwl*) of the OCC-DNA were included as feature vectors in the model development.
Extended Data Fig. 8 Correlation of F-score and r2 of the biomarker prediction models with the relative feature importance of each spectroscopic variable.
For the binary classification models (top rows), samples were divided into two groups–abnormal vs. normal levels of serum biomarkers–based on the clinical references (CA125: 50 U/mL, HE4: 150 pM, YKL40: 1650 pM) and assessed the prediction accuracy of abnormal levels of each biomarker. Feature importance of the prediction models shows which spectral parameters most impacted the model performance using an ablation study. Biomarker dependent variables that were identified in Extended Data Fig. 4 are highlighted in bold. Vertical dashed lines indicate F-score when all four spectroscopic variables (dint, dint*, dwl, and dwl*) of the OCC-DNA were included as feature vectors in the model development.
About this article
Cite this article
Kim, M., Chen, C., Wang, P. et al. Detection of ovarian cancer via the spectral fingerprinting of quantum-defect-modified carbon nanotubes in serum by machine learning. Nat. Biomed. Eng 6, 267–275 (2022). https://doi.org/10.1038/s41551-022-00860-y
This article is cited by
Nature Reviews Clinical Oncology (2022)