Profiling of circulating tumor DNA (ctDNA) in the bloodstream shows promise for noninvasive cancer detection. Chromatin fragmentation features have previously been explored to infer gene expression profiles from cell-free DNA (cfDNA), but current fragmentomic methods require high concentrations of tumor-derived DNA and provide limited resolution. Here we describe promoter fragmentation entropy as an epigenomic cfDNA feature that predicts RNA expression levels at individual genes. We developed ‘epigenetic expression inference from cell-free DNA-sequencing’ (EPIC-seq), a method that uses targeted sequencing of promoters of genes of interest. Profiling 329 blood samples from 201 patients with cancer and 87 healthy adults, we demonstrate classification of subtypes of lung carcinoma and diffuse large B cell lymphoma. Applying EPIC-seq to serial blood samples from patients treated with PD-(L)1 immune-checkpoint inhibitors, we show that gene expression profiles inferred by EPIC-seq are correlated with clinical response. Our results indicate that EPIC-seq could enable noninvasive, high-throughput tissue-of-origin characterization with diagnostic, prognostic and therapeutic potential.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Potential non-invasive biomarkers in tumor immune checkpoint inhibitor therapy: response and prognosis prediction
Biomarker Research Open Access 02 June 2023
Predicting response and toxicity to immune checkpoint inhibitors in lung cancer using antibodies to frameshift neoantigens
Journal of Translational Medicine Open Access 22 May 2023
DNA methylation analysis explores the molecular basis of plasma cell-free DNA fragmentation
Nature Communications Open Access 18 January 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
For each sample profiled in this study, we also provide anonymized fragmentomic data for fragments meeting minimal MAPQ and read FLAGs, which are available at https://epicseq.stanford.edu. These data are summarized across TSS regions by fragment size distributions (as in Fig. 1b). Moreover, the anonymized sequencing reads of samples profiled whole-genome (n = 3 deep whole-genome; n = 3 whole-genome and n = 24 shallow whole-genome) and whole-exome (n = 39 deep WES) are deposited at SRA PRJNA795275.
The custom EPIC-seq software code for fragmentomic featurization and gene expression inference from cfDNA BAM files can be accessed at https://epicseq.stanford.edu/.
Jahr, S. et al. DNA fragments in the blood plasma of cancer patients: quantitations and evidence for their origin from apoptotic and necrotic cells. Cancer Res. 61, 1659–1665 (2001).
Lo, Y. M. et al. Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci. Transl. Med. 2, 61ra91 (2010).
Heitzer, E., Auinger, L. & Speicher, M. R. Cell-free DNA and apoptosis: how dead cells inform about the living. Trends Mol. Med. 26, 519–528 (2020).
Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).
Phallen, J. et al. Direct detection of early-stage cancers using circulating tumor DNA. Sci. Transl. Med. 9, eaan2415 (2017).
Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).
Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385–389 (2019).
Heitzer, E., Haque, I. S., Roberts, C. E. S. & Speicher, M. R. Current and future perspectives of liquid biopsies in genomics-driven oncology. Nat. Rev. Genet. 20, 71–88 (2019).
Chabon, J. J. et al. Integrating genomic features for non-invasive early lung cancer detection. Nature 580, 245–251 (2020).
Van Opstal, D. et al. Origin and clinical relevance of chromosomal aberrations other than the common trisomies detected by genome-wide NIPS: results of the TRIDENT study. Genet. Med. 20, 480–485 (2018).
Fan, H. C. et al. Non-invasive prenatal measurement of the fetal genome. Nature 487, 320–324 (2012).
Knight, S. R., Thorne, A. & Lo Faro, M. L. Donor-specific cell-free DNA as a biomarker in solid organ transplantation. a systematic review. Transplantation 103, 273–283 (2019).
Chaudhuri, A. A. et al. Early detection of molecular residual disease in localized lung cancer by circulating tumor DNA profiling. Cancer Discov. 7, 1394–1403 (2017).
Lennon, A. M. et al. Feasibility of blood testing combined with PET-CT to screen for cancer and guide intervention. Science 369, eabb9601 (2020).
Zviran, A. et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat. Med. 26, 1114–1124 (2020).
Lo, Y. M. et al. Presence of donor-specific DNA in plasma of kidney and liver-transplant recipients. Lancet 351, 1329–1330 (1998).
Snyder, T. M., Khush, K. K., Valantine, H. A. & Quake, S. R. Universal noninvasive detection of solid organ transplant rejection. Proc. Natl Acad. Sci. USA 108, 6229–6234 (2011).
Lehmann-Werman, R. et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc. Natl Acad. Sci. USA 113, E1826–1834 (2016).
Jiang, P. et al. Preferred end coordinates and somatic variants as signatures of circulating tumor DNA associated with hepatocellular carcinoma. Proc. Natl Acad. Sci. USA 115, E10925–E10933 (2018).
Sun, K. et al. Orientation-aware plasma cell-free DNA fragmentation analysis in open chromatin regions informs tissue of origin. Genome Res. 29, 418–427 (2019).
Sadeh, R. et al. ChIP–seq of plasma cell-free nucleosomes identifies gene expression programs of the cells of origin. Nat. Biotechnol. 39, 586–598 (2021).
Lui, Y. Y. et al. Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin. Chem. 48, 421–427 (2002).
Fleischhacker, M. & Schmidt, B. Circulating nucleic acids (CNAs) and cancer—a survey. Biochim. Biophys. Acta 1775, 181–232 (2007).
Ramachandran, S., Ahmad, K. & Henikoff, S. Transcription and remodeling produce asymmetrically unwrapped nucleosomal intermediates. Mol. Cell 68, 1038–1053 e1034 (2017).
Snyder, M. W., Kircher, M., Hill, A. J., Daza, R. M. & Shendure, J. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164, 57–68 (2016).
Ivanov, M., Baranova, A., Butler, T., Spellman, P. & Mileyko, V. Non-random fragmentation patterns in circulating cell-free DNA reflect epigenetic regulation. BMC Genomics 16, S1 (2015).
Ulz, P. et al. Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat. Genet. 48, 1273–1278 (2016).
Wu, J. et al. Decoding genetic and epigenetic information embedded in cell free DNA with adapted SALP-seq. Int. J. Cancer 145, 2395–2406 (2019).
Jiang, P. et al. Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients. Proc. Natl Acad. Sci. USA 112, E1317–1325 (2015).
Underhill, H. R. et al. Fragment length of circulating tumor DNA. PLoS Genet. 12, e1006162 (2016).
Mouliere, F. et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci. Transl. Med. 10, eaat4921 (2018).
Ulz, P. et al. Inference of transcription factor binding from cell-free DNA enables tumor subtype prediction and early detection. Nat. Commun. 10, 4666 (2019).
Moss, J. et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat. Commun. 9, 5068 (2018).
Weintraub, H. & Groudine, M. Chromosomal subunits in active genes have an altered conformation. Science 193, 848–856 (1976).
Jiang, P. et al. Plasma DNA end-motif profiling as a fragmentomic marker in cancer, pregnancy, and transplantation. Cancer Discov. 10, 664–673 (2020).
The Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
The Cancer Genome Atlas Research Network Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).
Puglisi, F. et al. Prognostic value of thyroid transcription factor-1 in primary, resected, non-small cell lung carcinoma. Mod. Pathol. 12, 318–324 (1999).
Maloney, D. G. et al. Phase I clinical trial using escalating single-dose infusion of chimeric anti-CD20 monoclonal antibody (IDEC-C2B8) in patients with recurrent B-cell lymphoma. Blood 84, 2457–2466 (1994).
Ferlay, J. et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 136, E359–386 (2015).
Torre, L. A., Siegel, R. L. & Jemal, A. Lung cancer statistics. Adv. Exp. Med. Biol. 893, 1–19 (2016).
Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016).
Travis, W. D. et al. The 2015 World Health Organization Classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification. J. Thorac. Oncol. 10, 1243–1260 (2015).
Reck, M. & Rabe, K. F. Precision diagnosis and treatment for advanced non-small-cell lung cancer. N. Engl. J. Med. 377, 849–861 (2017).
Ettinger, D. S. et al. NCCN guidelines insights: non-small cell lung cancer, version 1.2020. J. Natl Compr. Cancer Netw. 17, 1464–1472 (2019).
Wiener, R. S., Schwartz, L. M., Woloshin, S. & Welch, H. G. Population-based risk for complications after transthoracic needle lung biopsy of a pulmonary nodule: an analysis of discharge records. Ann. Intern. Med. 155, 137–144 (2011).
Bubendorf, L., Lantuejoul, S., de Langen, A.J. & Thunnissen, E. Nonsmall cell lung carcinoma: diagnostic difficulties in small biopsies and cytological specimens: number 2 in the series ‘Pathology for the clinician’ edited by Peter Dorfmuller and Alberto Cavazza. Eur. Respir. Rev. 26, 170007 (2017).
McLean, A. E. B., Barnes, D. J. & Troy, L. K. Diagnosing lung cancer: the complexities of obtaining a tissue diagnosis in the era of minimally invasive and personalised medicine. J. Clin. Med. 7, 163 (2018).
Reck, M. et al. Pembrolizumab versus chemotherapy for PD-L1-positive non-small-cell lung cancer. N. Engl. J. Med. 375, 1823–1833 (2016).
Socinski, M. A. et al. Atezolizumab for first-line treatment of metastatic nonsquamous NSCLC. N. Engl. J. Med. 378, 2288–2301 (2018).
Gandhi, L. et al. Pembrolizumab plus chemotherapy in metastatic non-small-cell lung cancer. N. Engl. J. Med. 378, 2078–2092 (2018).
Hellmann, M. D. et al. Nivolumab plus Ipilimumab in lung cancer with a high tumor mutational burden. N. Engl. J. Med. 378, 2093–2104 (2018).
Camidge, D. R., Doebele, R. C. & Kerr, K. M. Comparing and contrasting predictive biomarkers for immunotherapy and targeted therapy of NSCLC. Nat. Rev. Clin. Oncol. 16, 341–355 (2019).
Nabet, B. Y. et al. Noninvasive early identification of therapeutic benefit from immune checkpoint inhibition. Cell 183, 363–376 e313 (2020).
Menon, M. P., Pittaluga, S. & Jaffe, E. S. The histological and biological spectrum of diffuse large B-cell lymphoma in the World Health Organization classification. Cancer J. 18, 411–420 (2012).
Sehn, L. H. et al. The revised International Prognostic Index (R-IPI) is a better predictor of outcome than the standard IPI for patients with diffuse large B-cell lymphoma treated with R-CHOP. Blood 109, 1857–1861 (2007).
Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000).
Pasqualucci, L. et al. Analysis of the coding genome of diffuse large B-cell lymphoma. Nat. Genet. 43, 830–837 (2011).
Cottereau, A. S. et al. Molecular profile and FDG-PET/CT total metabolic tumor volume improve risk classification at diagnosis for patients with diffuse large B-cell lymphoma. Clin. Cancer Res. 22, 3801–3809 (2016).
Scherer, F. et al. Distinct biological subtypes and patterns of genome evolution in lymphoma revealed by circulating tumor DNA. Sci. Transl. Med. 8, 364ra155 (2016).
Kurtz, D. M. et al. Circulating tumor DNA measurements as early outcome predictors in diffuse large B-cell lymphoma. J. Clin. Oncol. 36, 2845–2853 (2018).
Rosenwald, A. et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346, 1937–1947 (2002).
Basso, K. & Dalla-Favera, R. Germinal centres and B cell lymphomagenesis. Nat. Rev. Immunol. 15, 172–184 (2015).
Dunleavy, K. et al. Differential efficacy of bortezomib plus chemotherapy within molecular subtypes of diffuse large B-cell lymphoma. Blood 113, 6069–6076 (2009).
Thieblemont, C. et al. The germinal center/activated B-cell subclassification has a prognostic impact for response to salvage therapy in relapsed/refractory diffuse large B-cell lymphoma: a bio-CORAL study. J. Clin. Oncol. 29, 4079–4087 (2011).
Scott, D. W. et al. Determining cell-of-origin subtypes of diffuse large B-cell lymphoma using gene expression in formalin-fixed paraffin-embedded tissue. Blood 123, 1214–1217 (2014).
Nowakowski, G. S. et al. Lenalidomide combined with R-CHOP overcomes negative prognostic impact of non-germinal center B-cell phenotype in newly diagnosed diffuse large B-cell lymphoma: a phase II study. J. Clin. Oncol. 33, 251–257 (2015).
Wilson, W. H. et al. Targeting B cell receptor signaling with ibrutinib in diffuse large B cell lymphoma. Nat. Med. 21, 922–926 (2015).
Young, R. M. & Staudt, L. M. Targeting pathological B cell receptor signalling in lymphoid malignancies. Nat. Rev. Drug Discov. 12, 229–243 (2013).
Lenz, G. et al. Stromal gene signatures in large-B-cell lymphomas. N. Engl. J. Med. 359, 2313–2323 (2008).
Zelenetz, A. D. et al. NCCN guidelines insights: B-cell lymphomas, version 3.2019. J. Natl Compr. Cancer Netw. 17, 650–661 (2019).
Hans, C. P. et al. Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray. Blood 103, 275–282 (2004).
Lossos, I. S. et al. Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. N. Engl. J. Med. 350, 1828–1837 (2004).
Malumbres, R. et al. Paraffin-based 6-gene model predicts outcome in diffuse large B-cell lymphoma patients treated with R-CHOP. Blood 111, 5509–5514 (2008).
Alizadeh, A. A., Gentles, A. J., Lossos, I. S. & Levy, R. Molecular outcome prediction in diffuse large-B-cell lymphoma. N. Engl. J. Med. 360, 2794–2795 (2009).
Alizadeh, A. A. et al. Prediction of survival in diffuse large B-cell lymphoma based on the expression of 2 genes reflecting tumor and microenvironment. Blood 118, 1350–1358 (2011).
Chapuy, B. et al. Molecular subtypes of diffuse large B cell lymphoma are associated with distinct pathogenic mechanisms and outcomes. Nat. Med. 24, 679–690 (2018).
Ennishi, D. et al. Double-Hit gene expression signature defines a distinct subgroup of germinal center B-cell-like diffuse large B-cell lymphoma. J. Clin. Oncol. 37, 190–201 (2019).
Gentles, A. J. & Alizadeh, A. A. A few good genes: simple, biologically motivated signatures for cancer prognosis. Cell Cycle 10, 3615–3616 (2011).
Chambers, J. & Rabbitts, T. H. LMO2 at 25 years: a paradigm of chromosomal translocation proteins. Open Biol. 5, 150062 (2015).
Royer-Pokora, B. et al. The TTG-2/RBTN2 T cell oncogene encodes two alternative transcripts from two promoters: the distal promoter is removed by most 11p13 translocations in acute T cell leukaemia’s (T-ALL). Oncogene 10, 1353–1360 (1995).
Oram, S. H. et al. A previously unrecognized promoter of LMO2 forms part of a transcriptional regulatory circuit mediating LMO2 expression in a subset of T-acute lymphoblastic leukaemia patients. Oncogene 29, 5796–5808 (2010).
Boehm, T. et al. An unusual structure of a putative T cell oncogene which allows production of similar proteins from distinct mRNAs. EMBO J. 9, 857–868 (1990).
Smale, S. T. & Kadonaga, J. T. The RNA polymerase II core promoter. Annu. Rev. Biochem. 72, 449–479 (2003).
Bernstein, B. E. et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell 120, 169–181 (2005).
Wong, I. H. et al. Detection of aberrant p16 methylation in the plasma and serum of liver cancer patients. Cancer Res. 59, 71–73 (1999).
Chim, S. S. et al. Detection of the placental epigenetic signature of the maspin gene in maternal plasma. Proc. Natl Acad. Sci. USA 102, 14753–14758 (2005).
Fernandez, A. F. et al. A DNA methylation fingerprint of 1628 human samples. Genome Res. 22, 407–419 (2012).
Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13, 86 (2012).
Chan, K. C. et al. Noninvasive detection of cancer-associated genome-wide hypomethylation and copy number aberrations by plasma DNA bisulfite sequencing. Proc. Natl Acad. Sci. USA 110, 18761–18768 (2013).
Lun, F. M. et al. Noninvasive prenatal methylomic analysis by genomewide bisulfite sequencing of maternal plasma DNA. Clin. Chem. 59, 1583–1594 (2013).
Ou, X. et al. Epigenome-wide DNA methylation assay reveals placental epigenetic markers for noninvasive fetal single-nucleotide polymorphism genotyping in maternal plasma. Transfusion 54, 2523–2533 (2014).
Jensen, T. J. et al. Whole genome bisulfite sequencing of cell-free DNA and its cellular contributors uncovers placenta hypomethylated domains. Genome Biol. 16, 78 (2015).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009).
Koh, W. et al. Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proc. Natl Acad. Sci. USA 111, 7361–7366 (2014).
Srinivasan, S. et al. Small RNA sequencing across diverse biofluids identifies optimal methods for exRNA isolation. Cell 177, 446–462 e416 (2019).
Ibarra, A. et al. Non-invasive characterization of human bone marrow stimulation and reconstitution by cell-free messenger RNA sequencing. Nat. Commun. 11, 400 (2020).
Zhou, Z. et al. Extracellular RNA in a single droplet of human serum reflects physiologic and disease states. Proc. Natl Acad. Sci. USA 116, 19200–19208 (2019).
Verwilt, J. et al. When DNA gets in the way: a cautionary note for DNA contamination in extracellular RNA-seq studies. Proc. Natl Acad. Sci. USA 117, 18934–18936 (2020).
Schmitz, R. et al. Genetics and pathogenesis of diffuse large B-cell lymphoma. N. Engl. J. Med. 378, 1396–1407 (2018).
Gentles, A. J. et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat. Med. 21, 938–945 (2015).
Binkley, M. S. et al. KEAP1/NFE2L2 mutations predict lung cancer radiation resistance that can be targeted by glutaminase inhibition. Cancer Discov. 10, 1826–1841 (2020).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Alig, S. et al. Short diagnosis-to-treatment interval is associated with increased tumor burden measured by circulating tumor DNA and metabolic tumor volume in diffuse large B-cell lymphoma. J. Clin. Oncol. 39, 2605–2616 (2021).
George, J. et al. Comprehensive genomic profiles of small cell lung cancer. Nature 524, 47–53 (2015).
U, M.Talevich, E., Katiyar, S., Rasheed, K. & Kannan, N. Prediction and prioritization of rare oncogenic mutations in the cancer Kinome using novel features and multiple classifiers. PLoS Comput. Biol. 10, e1003545 (2014).
Venkatraman, E. S. & Olshen, A. B. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23, 657–663 (2007).
Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).
Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).
We thank the patients and families who participated in this study, and we are grateful for all the insightful discussions with the members of the Alizadeh and Diehn laboratories. Funding sources: this work was supported by the National Cancer Institute (M.S.E., grant no. R25CA180993; M.S.E., grant no. T32CA009302; M.M., grant no. F99CA212457, M.D. and A.A.A., grant nos. R01CA188298, R01CA254179, R01CA257655, R01CA244526, R01CA233975, and R01CA229766), the US National Institutes of Health Director’s New Innovator Award Program (M.D.; grant no. 1-DP2-CA186569), the Virginia and D.K. Ludwig Fund for Cancer Research (M.D. and A.A.A.), and philanthropic support from the Bakewell Foundation (to M.D. and A.A.A.), the CRK Faculty Scholar Fund (M.D.), the Troper-Wojcicki Family Gift, The Shanahan Family Lymphoma Fund, Arzang Family Lymphoma Fund, The Skeff Family Lymphoma Fund, The Cane-Nowak Family Foundation, The Marc Benioff Fund, Jewish Communal Fund for Lymphoma Research, The Sara Schottenstein Memorial Fund, and The Moghadam Family Endowed Professorship (to A.A.A.). A.A.A. is a scholar of the Leukemia and Lymphoma Society. J.M.I., C.M.R. and B.T.L. are supported by the MSKCC Support grant no. P30 CA008748 from the NIH. B.Y.N. is a Stanford Cancer Systems Biology Scholar and supported by the NIH (grant no. 5R25CA180993) and by the Postdoctoral Research Fellowship (grant no. 134031-PF-19-164-01-TBG) from the American Cancer Society. Present address for M.M. is the Department of Biotechnology, College of Science, University of Tehran, Iran, and for D.A.K. is the Northwell Health Cancer Institute and Feinstein Institute of Research, Lake Success, NY, USA.
A.A.A. reports research funding from Celgene, Pfizer, ownership interests in FortySeven, CiberMed, ForeSight and paid consultancy from Roche, Genentech, Janssen, Pharmacyclics, Gilead, Celgene and Chugai. M.D. reports research funding from Varian Medical Systems and Illumina, ownership interest in CiberMed, ForeSight and paid consultancy from Roche, AstraZeneca, Illumina, RefleXion and BioNTech. J.J.C. reports paid consultancy from Lexent Bio Inc. and ownership interests in ForeSight. A.M.N. has patent filings related to expression deconvolution and cancer biomarkers and has served as a consultant for Roche, Merck and CiberMed. D.M.K. reports paid consultancy from Roche. B.T.L. has served as an uncompensated advisor and consultant to Amgen, Genentech, Boehringer Ingelheim, Lilly, AstraZeneca and Daiichi Sankyo. B.T.L. reports receiving research grants to his institution from Amgen, Genentech, AstraZeneca, Daiichi Sankyo, Lilly, Illumina, GRAIL, Guardant Health, Hengrui Therapeutics, MORE Health and Bolt Biotherapeutics. B.T.L. has received academic travel support from MORE Health and Jiangsu Hengrui Medicine. B.T.L. reports to be inventor on two institutional patents at MSKCC (US62/685,057, US62/514,661) and has intellectual property rights as a book author at Karger Publishers and Shanghai Jiao Tong University Press. J.M.I. reports serving as an unpaid consultant to Amgen and Roche-Genentech, institutional research support from Guardant Health and GRAIL, and ownership interest in LumaCyte. A.A.A., M.D., M.S.E., D.M.K., J.J.C., and B.Y.N. report patent filings related to cancer biomarkers. M.S.E., M.M., A.A.A. and M.D. have patent filing related to this paper. B.Y.N. is currently an employee and holds stock from Roche/Genentech. The remaining authors declare no competing interests.
Peer review information
Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Fragment length density at the transcription start sites varies with gene expression.
Fragment length density at the transcription start sites varies with gene expression. (a) A heatmap of fragment length densities across 1,748 groups of genes (similar to Fig. 1a). Three regions R1 (100–150 bps), R2 (151–210 bps), and R3 (211–300 bps) show enrichment in either high or low expression gene groups. (b) The percent of fragments within each region defined in panel (a) in the deep whole-genome sample across deciles of the reference PBMC gene expression vector, that is, 10 groups of genes when sorted by their expression values in PBMC. Highly expressed genes include fewer monosome fragments, indicating a wider distribution and thereby a higher PFE. (c) Fraction of fragments within the three regions, R1-R3, for exons vs introns vs TSS sites for the top (and bottom) 2000 genes as ranked by expression. The fraction of monosomal fragments within TSS regions is substantially lower than within intronic and exonic regions (63.5% at TSS vs ~71% at non-TSS). Pearson’s Chi-Squared goodness-of-fit tests resulted in the following test statistics (TSS vs Exon: G = 62,133 [P < 2.2E-16]; TSS vs Intron: G = 84,110 [P < 2.2E-16]). (d) Fraction of fragments falling within each region (R1, R2, and R3) for mutant cfDNA fragments and their wildtype counterparts. Each dot represents one tuple (variant-patient) and the connecting lines indicate the paired mutant-wildtype status. These results show that the mutant cfDNA fragments are enriched for R1 and R3 while wildtype fragments are enriched in R2. (e) A contour plot capturing the relationship between expression level (depicted by heat) as a function of two cfDNA fragmentomic features used in the gene inference model: PFE and NDR. (f) ROC analysis of a ‘NSCLC Score’ for noninvasively distinguishing patients with NSCLC from healthy controls (AUC = 0.76). The genes comprising this score were first defined from external RNA-Seq profiling data of primary NSCLC tumor tissues and blood samples, allowing subsequent calculation of their corresponding PFE in cfDNA samples profiled by WGS for independent NSCLC cases and healthy controls. (g) A schematic for the analyses performed for Fig. 2d–h. (h) Sample-level ‘SCLC Score’ from deep whole exome analysis of cfDNA and associated diagnostic performance. As in the exercise for NSCLC depicted in panel f, the genes comprising this SCLC score were first defined from external RNA-Seq profiling data of primary SCLC tumor tissues and blood samples. The corresponding PFEs (as the difference between the overall PFE level of top and bottom gene signatures) were subsequently calculated in cfDNA samples we profiled by deep WES for independent SCLC cases and healthy controls. Using these scores, an AUC of 0.9 was achieved in distinguishing cases from controls. (i) The Venn diagram of SCLC high genes identified in cfDNA (whole exome profiling) and tumor biopsy (RNA-Seq transcriptome profiling), with significance of overlap assessed by hypergeometric test.
Extended Data Fig. 2 Ensemble model accurately predicts gene expression in validation samples.
Ensemble model accurately predicts gene expression in validation samples. (a) The scatterplot of the predicted vs a population-averaged gene expression across 1,748 groups of genes. The underlying data are from a merged cfDNA ‘meta-sample’ (pooled from merger of 27 healthy subjects profiled by relatively shallow WGS), achieving a correlation of 0.9 in initial validation. (b) The meta sample from panel (a) was used to assess model performance, when considering TSS-level expression values without gene grouping (n = 1), as well as scenarios with 2, 3, 5 and 10 genes per group. The Pearson correlation between observed expression in PBMC versus predicted expression from our model (combining PFE and NDR) is shown in green bars. This correlation substantially improves as number of genes per group increases. The Pearson correlation values between observed gene expression and those predicted by NDR or PFE expression are shown in blue and green bars, respectively. (c) Scatterplot depicts predicted versus observed gene expression measurements across 1,748 groups of genes (dots), when comparing expression measurements by RNA-Seq on matched PBMC (x-axis) against plasma cfDNA inferences (y-axis), for a validation sample from a healthy adult that we also profiled by deep WGS (~200x). This achieved a Pearson correlation of 0.86. (d) Similar to panel c, but for a second healthy adult control subject also profiled for validation, by deep WGS of cfDNA and matched RNA-Seq of PBMC (Pearson r = 0.91). (e-f) The same analysis as in panels (a-b) for a meta whole-genome sample generated from healthy subjects from Zviran et al. (g) The whole genome samples (depth ~20-40x) from Zviran et al. were used with every ten genes grouped and the concordance between model-predicted expression and PBMC expression are evaluated using Pearson correlation (that is, each dot is one subject). The non-cancer samples show a significantly higher correlation with normal PBMC than lung cancer cases (Wilcoxon P = 0.018). (h) The ichorCNA tumor fraction estimates of the lung cancer cases in panel f are used to compare with the correlations in panel f. As shown in a scatterplot, as tumor fraction increases, the correlation decreases (r = −0.69, P = 0.00052).
Extended Data Fig. 3 Case-level information of samples profiled by EPIC-Seq.
Cohorts and cell-free DNA samples profiled by EPIC-seq in this study, including Cancer Cases and Control Subjects. (a) Schema depicts the full set of specimens profiled by EPIC-Seq (n = 373), including those meeting Quality Control (QC) criteria (n = 352, 95%). A subset of samples were used for the initial gene expression model tuning (n = 2) and TSS filtering (n = 21). The remaining 329 samples were profiled by EPIC-Seq to address disease-specific questions, including utility for cancer detection, classification of histology and cell-of-origin, and response monitoring. These included 252 samples (76.6%) from 226 subjects that comprised our Discovery/Training cohort (large light purple rectangle), as well as subsequent profiling of a Validation Cohort of 77 samples (23.4%) from 69 subjects, after models were ‘locked down’ (large light green rectangle). A subset of 22 NSCLC patients where a pair of serial blood samples were monitored for ICI response (to allow comparisons of both EPIC-Seq and CAPP-Seq and assess biological plausibility), but this exercise was not subject to any model training. No samples were shared between Training and Validation exercises, with all models locked down before independent validations. Four healthy subjects (4.5%) provided more than one cfDNA specimen with one used for Training and the second for Validation. (b) Distribution of demographic, clinical, anatomic, and pathological variables for subjects profiled by EPIC-Seq. Tabulated are the relevant indices for cancer cases (235 blood samples 201 patients), including NSCLC patients (light blue; 109 blood samples from 87 patients), DLBCL patients (light orange; 126 blood samples from 104 patients), and non-cancer control subjects (gray; 94 blood samples from 87 adults).
Extended Data Fig. 4 Correlation between EPIC-lung score and clinical factors.
Concordance between EPIC-Seq measurements and established NSCLC risk factors including metabolic tumor burden, ctDNA level, and ctDNA response. (a) Concordance between EPIC-lung score and metabolic tumor volume (MTV), as measured by Spearman correlation (ρ = 0.67; P = 0.04). (b) Concordance between EPIC-lung score and the ctDNA mean allele fractions as measured by CAPP-Seq, evaluated using Spearman correlation (ρ = 0.5; P = 3E-5). (c) Relationships between genetic versus epigenetic molecular responses to Immune Checkpoint Inhibitor (ICI) therapy in advanced NSCLC. Scatterplot compares molecular responses measured noninvasively by CAPP-Seq (x-axis; fold change, Log10) and EPIC-Seq (lung dynamics score; y-axis) using serial plasma profiling before and after ICI therapy. The two orthogonal measures show moderate but significant correlation (r = 0.53, P = 0.012).
Extended Data Fig. 5 Correlation between EPIC-lymphoma score and clinical factors, results of the validation set and prognostic value of the LMO2 distal promoter.
Concordance between EPIC-Seq measurements and established DLBCL risk factors impacting outcome, including metabolic tumor volume, ctDNA level, and Cell-of-Origin. (a) The boxplots illustrate the two groups of patients stratified by their metabolic tumor volumes (>220 vs <220 mL; Wilcoxon P = 0.015). (b) Similar to panel a, but for the DLBCL Validation Cohort. (c) Concordance between EPIC-DLBCL scores and ctDNA mean allele fractions (from CAPP-Seq), evaluated using Spearman correlation (ρ = 0.66; P < 2E-16). (d) The EPIC-DLBCL model is applied to the cfDNA profiles of 13 samples from two DLBCL patients (DLBCL002 [ABC] and DLBCL007 [GCB]). The concordance between the resulting scores and the ctDNA mean allele fractions is evaluated by Spearman correlation (ρ = 0.79; P = 0.004). (e) Relationship between DLBCL cell-of-origin EPIC-Seq GCB scores and mutation-based GCB scores as measured by CAPP-Seq in the validation set (Spearman ρ = 0.64, P = 0.01). Each dot represents one sample (related to Fig. 6a). (f) Relationship between EPIC-Seq GCB scores from cfDNA and matched tumor tissue classification by routine Hans immunohistochemical algorithm in the validation set (Wilcoxon P = 0.001; related to Fig. 6b). (g) Relationship between EPIC-Seq GCB scores from cfDNA and tumor classification by RNA-seq of paired tumor tissue (Jonckheere’s trend test, P = 0.015). Box-and-whisker plots depict the EPIC-Seq GCB score in individual samples profiled by EPIC-Seq (dots), with boxes spanning the inter-quartile range; the median is horizontally marked with a line in each box, and whiskers span the 1.5 IQRs. (h) The Kaplan-Meier curves of EFS of the patients when labeled by the Hans algorithm. The non-GCB group contains both Non-GCB and Unknown. (i) The violin plot shows the distributions of Cox Proportional Hazard Model Z-scores when genes are grouped according to their effects on outcome (measured as EFS) in three prior tumor studies.
Extended Data Fig. 6 Pre-analytical factors and TSS GC-content correction effect on PFE.
Effect of preanalytical factors on fragment size entropy and effect of GC-content correction on expression model performance. (a) The concordance between PFE values for three healthy controls profiled by EPIC-Seq using paired Streck BCT and K2EDTA tubes. A Pearson correlation of 0.94 was observed between tube types. (b) Effect of time on the bench (that is, in days) on the PFEs in a cohort of plasma cfDNA samples. (c) Effect of additional PCR cycles on PFE. Here we profiled 4 healthy control cfDNA samples by the CAPP-Seq lung cancer selector when 3 additional PCR cycles were included to study their effect. A Pearson correlation of 0.95 was observed between standard conditions versus those incorporating additional PCR cycles. (d) Effect of correction for GC-content of TSS regions on gene expression model accuracy. Four scenarios were studied when correcting features using the GC values for NDR and PFE: PFE alone corrected, NDR alone corrected, both corrected, and neither corrected. The correction was performed using a LOESS function with a span of 0.5. Two healthy control cfDNA samples were profiled by deep whole genome sequencing. For these two subjects, we also profiled the matched PBMC by RNA-Sequencing. We then compared the predicted values from cfDNA against observed values from RNA-Seq for each of the different GC-correction scenarios and tested concordance. The concordance was evaluated using three metrics: Pearson correlation, Spearman correlation, and root-mean-square error (RMSE). When considering both cfDNA samples, none of the four GC-correction approaches seemed to consistently improve correlations or reduce associated error profiles. (e) Whole exome profiling of small-cell lung cancer samples in Fig. 2 are used to investigate association between PFEs and copy number aberrations. We first determined genes with PFE significantly higher in SCLC cfDNA samples (n = 11) compared with healthy control cfDNA samples (n = 28) (‘High’ PFE). Similarly, we determined genes with significantly lower PFEs in SCLC cfDNA samples (‘Low’ PFE). Then, the copy number states (CNS) corresponding to all genes were identified by overlapping copy number profiles from CNVkit with the genomic coordinates of the first exons. The CNS values were then dichotomized into (i) amplification vs no-amplification and (ii) deletion vs no-deletion. Next, we summarized these by contingency tables for (i) vs PFE levels (top table) and (ii) vs PFE levels (bottom table). Finally, the association between the two was examined via Fisher’s exact test, which showed insignificant associations in both tests (P = 0.97 and P = 0.17; for amplifications and deletions, respectively).
Extended Data Fig. 7
Mechanistic model and gene detection sensitivity with various parameters. (a) The cartoon shows four scenarios considered in our simulations: (i) protected, meaning that nucleosomes are well-positioned and are all present, (ii) one nucleosome-free position is present, (iii) two nucleosome-free positions are present and (iv) three nucleosome-free positions are present. (b) The density plots show the results of generating fragment lengths via the model described in panel a. Three panels correspond to scenarios (ii-iv) vs (i) in a. (c) A varying mixture parameters is considered and its effect on the entropy for three different coverages: 500x, 2500x and 5000x. (d) A summary of panel c for active gene detection sensitivity while achieving a specificity of 85%. The error bars are from the sensitivities calculated using the ‘ci.se’ function in R pROC package. The colors correspond to three different coverages in panel c.
Supplementary Tables 1–4
Supplementary Table 1: List of samples analyzed or profiled in this study. Supplementary Table 2: Gene groups and their corresponding features in the deep whole-genome sample. Supplementary Table 3: Targeted TSSs in the EPIC-seq panel. Supplementary Table 4: Clinical characteristics of the EPIC-seq cohort and scores from different classifiers.
Rights and permissions
About this article
Cite this article
Esfahani, M.S., Hamilton, E.G., Mehrmohamadi, M. et al. Inferring gene expression from cell-free DNA fragmentation profiles. Nat Biotechnol 40, 585–597 (2022). https://doi.org/10.1038/s41587-022-01222-4
This article is cited by
Predicting response and toxicity to immune checkpoint inhibitors in lung cancer using antibodies to frameshift neoantigens
Journal of Translational Medicine (2023)
Potential non-invasive biomarkers in tumor immune checkpoint inhibitor therapy: response and prognosis prediction
Biomarker Research (2023)
DNA methylation analysis explores the molecular basis of plasma cell-free DNA fragmentation
Nature Communications (2023)
Enhanced cancer detection from cell-free DNA
Nature Biotechnology (2022)
Circulating tumor DNA in B-cell lymphoma: technical advances, clinical applications, and perspectives for translational research