The analysis of cell-free DNA (cfDNA) in plasma represents a rapidly advancing field in medicine. cfDNA consists predominantly of nucleosome-protected DNA shed into the bloodstream by cells undergoing apoptosis. We performed whole-genome sequencing of plasma DNA and identified two discrete regions at transcription start sites (TSSs) where nucleosome occupancy results in different read depth coverage patterns for expressed and silent genes. By employing machine learning for gene classification, we found that the plasma DNA read depth patterns from healthy donors reflected the expression signature of hematopoietic cells. In patients with cancer having metastatic disease, we were able to classify expressed cancer driver genes in regions with somatic copy number gains with high accuracy. We were able to determine the expressed isoform of genes with several TSSs, as confirmed by RNA-seq analysis of the matching primary tumor. Our analyses provide functional information about cells releasing their DNA into the circulation.
At a glance
- Cell-free nucleic acids as biomarkers in cancer patients. Nat. Rev. Cancer 11, 426–437 (2011). , &
- Circulating tumor cells and DNA as liquid biopsies. Genome Med. 5, 73 (2013). , , , &
- Liquid biopsy: monitoring cancer-genetics in the blood. Nat. Rev. Clin. Oncol. 10, 472–484 (2013). , , &
- Liquid biopsies: genotyping circulating tumor DNA. J. Clin. Oncol. 32, 579–586 (2014). &
- Circulating tumor DNA as a liquid biopsy for cancer. Clin. Chem. 61, 112–123 (2015). , &
- Detection and quantification of mutations in the plasma of patients with colorectal tumors. Proc. Natl. Acad. Sci. USA 102, 16368–16373 (2005). et al.
- Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci. Transl. Med. 2, 61ra91 (2010). et al.
- Replicating nucleosomes. Sci. Adv. 1, e1500587 (2015). &
- Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164, 57–68 (2016). , , , &
- Controls of nucleosome positioning in the human genome. PLoS Genet. 8, e1003036 (2012). et al.
- Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887–898 (2008). et al.
- Histone exchange, chromatin structure and the regulation of transcription. Nat. Rev. Mol. Cell Biol. 16, 178–189 (2015). &
- Determinants of nucleosome organization in primary human cells. Nature 474, 516–520 (2011). et al.
- High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA. BMC Med. Genomics 8, 29 (2015). , &
- Human housekeeping genes, revisited. Trends Genet. 29, 569–574 (2013). &
- Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin. Chem. 48, 421–427 (2002). et al.
- Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. Proc. Natl. Acad. Sci. USA 112, E5503–E5512 (2015). et al.
- Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proc. Natl. Acad. Sci. USA 111, 7361–7366 (2014). et al.
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
- Non-invasive detection of genome-wide somatic copy number alterations by liquid biopsies. Mol. Oncol. 10, 494–502 (2016). , , &
- The dynamic range of circulating tumor DNA in metastatic breast cancer. Breast Cancer Res. 16, 421 (2014). et al.
- Tumor-associated copy number changes in the circulation of patients with prostate cancer identified through whole-genome sequencing. Genome Med. 5, 30 (2013). et al.
- Changes in colorectal carcinoma genomes under anti-EGFR therapy identified by whole-genome plasma DNA sequencing. PLoS Genet. 10, e1004271 (2014). et al.
- Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012). et al.
- Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature 497, 108–112 (2013). et al.
- Co-occurrence of MYC amplification and TP53 mutations in human cancer. Nat. Genet. 48, 104–106 (2016). , &
- Systemic therapy for patients with advanced human epidermal growth factor receptor 2–positive breast cancer: American Society of Clinical Oncology clinical practice guideline. J. Clin. Oncol. 32, 2078–2099 (2014). et al.
- The FGFR landscape in cancer: analysis of 4,853 tumors by next-generation sequencing. Clin. Cancer Res. 22, 259–267 (2016). et al.
- Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014). et al.
- The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010). et al.
- Cancer genome landscapes. Science 339, 1546–1558 (2013). et al.
- Whole-genome plasma sequencing reveals focal amplifications as a driving force in metastatic prostate cancer. Nat. Commun. 7, 12008 (2016). et al.
- Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat. Rev. Genet. 13, 720–731 (2012). &
- Non-random fragmentation patterns in circulating cell-free DNA reflect epigenetic regulation. BMC Genomics 16 (Suppl. 13), S1 (2015). , , , &
- Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). &
- The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). et al.
- CGHweb: a tool for comparing DNA copy number segmentations from multiple algorithms. Bioinformatics 24, 1014–1015 (2008). , &
- Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012). et al.
- Supplementary Figure 1: Mapping of the nucleosome-depleted region. (93 KB)
Localization of the NDR, which was mapped by analyses of 100 (red) and 1,000 (orange) highly expressed genes in the 104 plasma samples from healthy donors and which was most often observed in a –150 bp to +50 bp window with respect to the TSS (blue, 1,000 most weakly expressed genes).
- Supplementary Figure 2: Classification of the 5,000 most highly and least expressed genes. (118 KB)
Support vector machine (SVM) classification based on normalized 2K-TSS and NDR coverage for the 5,000 most highly and least expressed genes. Red and dark blue circles represent genes correctly assigned to the expressed and unexpressed clusters, respectively, whereas light blue and orange circles represent incorrectly assigned genes (as in Fig. 3b).
- Supplementary Figure 3: Quantitative relationship between nucleosome occupancy and gene expression. (167 KB)
(a) Correlation between 2K-TSS (left) and NDR (right) coverage and FPKM percentiles. (b) Means and distribution of the 2K-TSS and NDR coverage parameters of genes grouped into deciles. (c) Average FPKM percentile of binned 2K-TSS and NDR coverage parameters.
- Supplementary Figure 4: Comparison of copy number profiles of the matching primary tumor with plasma DNA. (222 KB)
The copy number profiles of the matching primary tumors B7 (top) and B13 (bottom) were obtained by whole-genome sequencing with a shallow sequencing depth. Pairwise comparisons of genomic position–mapped profiles revealed high correlations between the copy number profiles (Pearson correlation coefficients = 0.74 (B7) and 0.88 (B13)).
- Supplementary Figure 5: Reconstruction of the 12p11.1 nucleosome array with high-coverage sequenced plasma samples. (100 KB)
Assembly of the 12p11.1 nucleosome arrays in plasma samples from B7, B13, controls, and GM12878 for comparison.
- Supplementary Figure 6: TSS nucleosome occupancy of unexpressed and housekeeping genes in high-coverage sequenced plasma samples in B7 and B13. (161 KB)
(a,b) Nucleosome occupancy at TSSs of unexpressed genes (fantom.gsc.riken.jp/5/) and housekeeping genes2 had the expected different pattern for B7 (a) and B13 (b).
- Supplementary Figure 7: Distribution of the prediction consent. (28 KB)
Histogram of prediction consent in merged control (n=104) data. For the majority of genes, the prediction consent was above 95%; there are only a few genes with a prediction consent below 75%.
- Supplementary Text and Figures (1,648 KB)
Supplementary Figures 1–7 and Supplementary Note.
- Supplementary Table 1 (11 KB)
List of the 100 most highly expressed genes based on plasma RNA-seq data provided by Koh et al. but predicted to be unexpressed (n = 12).
- Supplementary Table 2 (19 KB)
List of the 1,000 most highly expressed genes based on plasma RNA-seq data provided by Koh et al. but predicted to be unexpressed (n = 245).
- Supplementary Table 3 (11 KB)
Subsampling of sequencing data to establish a lower boundary of necessary sequencing coverage.
- Supplementary Table 4 (19 KB)
Prediction and FPKM values for genes in focal amplifications having log2 ratio > 1 in breast cancer case B7.
- Supplementary Table 5 (25 KB)
Prediction and FPKM values for genes in focal amplifications having log2 ratio > 1 in breast cancer case B13.