Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Inferring expressed genes by whole-genome sequencing of plasma DNA

Abstract

The analysis of cell-free DNA (cfDNA) in plasma represents a rapidly advancing field in medicine. cfDNA consists predominantly of nucleosome-protected DNA shed into the bloodstream by cells undergoing apoptosis. We performed whole-genome sequencing of plasma DNA and identified two discrete regions at transcription start sites (TSSs) where nucleosome occupancy results in different read depth coverage patterns for expressed and silent genes. By employing machine learning for gene classification, we found that the plasma DNA read depth patterns from healthy donors reflected the expression signature of hematopoietic cells. In patients with cancer having metastatic disease, we were able to classify expressed cancer driver genes in regions with somatic copy number gains with high accuracy. We were able to determine the expressed isoform of genes with several TSSs, as confirmed by RNA-seq analysis of the matching primary tumor. Our analyses provide functional information about cells releasing their DNA into the circulation.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Plasma DNA fragment size and patterns of nucleosome positioning.
Figure 2: Nucleosome positioning at transcription start sites.
Figure 3: Classification of expressed and silent genes by plasma DNA read depth analyses.
Figure 4: Procedure for predicting expressed genes in cancer from blood.
Figure 5: Identification of expressed genes in cancer from the peripheral blood.

Accession codes

Accessions

NCBI Reference Sequence

Sequence Read Archive

References

  1. Schwarzenbach, H., Hoon, D.S. & Pantel, K. Cell-free nucleic acids as biomarkers in cancer patients. Nat. Rev. Cancer 11, 426–437 (2011).

    CAS  Article  PubMed  Google Scholar 

  2. Heitzer, E., Auer, M., Ulz, P., Geigl, J.B. & Speicher, M.R. Circulating tumor cells and DNA as liquid biopsies. Genome Med. 5, 73 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Crowley, E., Di Nicolantonio, F., Loupakis, F. & Bardelli, A. Liquid biopsy: monitoring cancer-genetics in the blood. Nat. Rev. Clin. Oncol. 10, 472–484 (2013).

    CAS  Article  PubMed  Google Scholar 

  4. Diaz, L.A. Jr. & Bardelli, A. Liquid biopsies: genotyping circulating tumor DNA. J. Clin. Oncol. 32, 579–586 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Heitzer, E., Ulz, P. & Geigl, J.B. Circulating tumor DNA as a liquid biopsy for cancer. Clin. Chem. 61, 112–123 (2015).

    CAS  Article  PubMed  Google Scholar 

  6. Diehl, F. et al. Detection and quantification of mutations in the plasma of patients with colorectal tumors. Proc. Natl. Acad. Sci. USA 102, 16368–16373 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. Lo, Y.M. et al. Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci. Transl. Med. 2, 61ra91 (2010).

    CAS  Article  PubMed  Google Scholar 

  8. Ramachandran, S. & Henikoff, S. Replicating nucleosomes. Sci. Adv. 1, e1500587 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Snyder, M.W., Kircher, M., Hill, A.J., Daza, R.M. & Shendure, J. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164, 57–68 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. Gaffney, D.J. et al. Controls of nucleosome positioning in the human genome. PLoS Genet. 8, e1003036 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. Schones, D.E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887–898 (2008).

    CAS  Article  PubMed  Google Scholar 

  12. Venkatesh, S. & Workman, J.L. Histone exchange, chromatin structure and the regulation of transcription. Nat. Rev. Mol. Cell Biol. 16, 178–189 (2015).

    CAS  Article  PubMed  Google Scholar 

  13. Valouev, A. et al. Determinants of nucleosome organization in primary human cells. Nature 474, 516–520 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. Chandrananda, D., Thorne, N.P. & Bahlo, M. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA. BMC Med. Genomics 8, 29 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Eisenberg, E. & Levanon, E.Y. Human housekeeping genes, revisited. Trends Genet. 29, 569–574 (2013).

    CAS  Article  PubMed  Google Scholar 

  16. Lui, Y.Y. et al. Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin. Chem. 48, 421–427 (2002).

    CAS  PubMed  Google Scholar 

  17. Sun, K. et al. Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. Proc. Natl. Acad. Sci. USA 112, E5503–E5512 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Koh, W. et al. Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proc. Natl. Acad. Sci. USA 111, 7361–7366 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  20. Heitzer, E., Ulz, P., Geigl, J.B. & Speicher, M.R. Non-invasive detection of genome-wide somatic copy number alterations by liquid biopsies. Mol. Oncol. 10, 494–502 (2016).

    CAS  Article  PubMed  Google Scholar 

  21. Heidary, M. et al. The dynamic range of circulating tumor DNA in metastatic breast cancer. Breast Cancer Res. 16, 421 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Heitzer, E. et al. Tumor-associated copy number changes in the circulation of patients with prostate cancer identified through whole-genome sequencing. Genome Med. 5, 30 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. Mohan, S. et al. Changes in colorectal carcinoma genomes under anti-EGFR therapy identified by whole-genome plasma DNA sequencing. PLoS Genet. 10, e1004271 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Carter, S.L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. Murtaza, M. et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature 497, 108–112 (2013).

    CAS  Article  PubMed  Google Scholar 

  26. Ulz, P., Heitzer, E. & Speicher, M.R. Co-occurrence of MYC amplification and TP53 mutations in human cancer. Nat. Genet. 48, 104–106 (2016).

    CAS  Article  PubMed  Google Scholar 

  27. Giordano, S.H. et al. Systemic therapy for patients with advanced human epidermal growth factor receptor 2–positive breast cancer: American Society of Clinical Oncology clinical practice guideline. J. Clin. Oncol. 32, 2078–2099 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  28. Helsten, T. et al. The FGFR landscape in cancer: analysis of 4,853 tumors by next-generation sequencing. Clin. Cancer Res. 22, 259–267 (2016).

    CAS  Article  PubMed  Google Scholar 

  29. Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. Ulz, P. et al. Whole-genome plasma sequencing reveals focal amplifications as a driving force in metastatic prostate cancer. Nat. Commun. 7, 12008 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. Adelman, K. & Lis, J.T. Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat. Rev. Genet. 13, 720–731 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. Ivanov, M., Baranova, A., Butler, T., Spellman, P. & Mileyko, V. Non-random fragmentation patterns in circulating cell-free DNA reflect epigenetic regulation. BMC Genomics 16 (Suppl. 13), S1 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  36. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  37. Lai, W., Choudhary, V. & Park, P.J. CGHweb: a tool for comparing DNA copy number segmentations from multiple algorithms. Bioinformatics 24, 1014–1015 (2008).

    CAS  Article  PubMed  Google Scholar 

  38. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to S. Perakis for critical reading and editing of this manuscript. This work was supported by CANCER-ID, a project funded by the Innovative Medicines Joint Undertaking (IMI JU).

Author information

Authors and Affiliations

Authors

Contributions

P.U. and M.R.S. designed the study. M.A. and R.G. performed the experiments. P.U., G.G.T., J.B.G., E.H., and M.R.S. analyzed data. E.P. and G.P. provided clinical samples and clinical information. S.W.J. and L.A. performed pathology analyses. K.K. conducted RNA-seq. PU., E.H., and M.R.S. supervised the study. P.U., J.B.G., E.H., and M.R.S. wrote the manuscript. All authors revised the manuscript.

Corresponding author

Correspondence to Michael R Speicher.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Mapping of the nucleosome-depleted region.

Localization of the NDR, which was mapped by analyses of 100 (red) and 1,000 (orange) highly expressed genes in the 104 plasma samples from healthy donors and which was most often observed in a –150 bp to +50 bp window with respect to the TSS (blue, 1,000 most weakly expressed genes).

Supplementary Figure 2 Classification of the 5,000 most highly and least expressed genes.

Support vector machine (SVM) classification based on normalized 2K-TSS and NDR coverage for the 5,000 most highly and least expressed genes. Red and dark blue circles represent genes correctly assigned to the expressed and unexpressed clusters, respectively, whereas light blue and orange circles represent incorrectly assigned genes (as in Fig. 3b).

Supplementary Figure 3 Quantitative relationship between nucleosome occupancy and gene expression.

(a) Correlation between 2K-TSS (left) and NDR (right) coverage and FPKM percentiles. (b) Means and distribution of the 2K-TSS and NDR coverage parameters of genes grouped into deciles. (c) Average FPKM percentile of binned 2K-TSS and NDR coverage parameters.

Supplementary Figure 4 Comparison of copy number profiles of the matching primary tumor with plasma DNA.

The copy number profiles of the matching primary tumors B7 (top) and B13 (bottom) were obtained by whole-genome sequencing with a shallow sequencing depth. Pairwise comparisons of genomic position–mapped profiles revealed high correlations between the copy number profiles (Pearson correlation coefficients = 0.74 (B7) and 0.88 (B13)).

Supplementary Figure 5 Reconstruction of the 12p11.1 nucleosome array with high-coverage sequenced plasma samples.

Assembly of the 12p11.1 nucleosome arrays in plasma samples from B7, B13, controls, and GM12878 for comparison.

Supplementary Figure 6 TSS nucleosome occupancy of unexpressed and housekeeping genes in high-coverage sequenced plasma samples in B7 and B13.

(a,b) Nucleosome occupancy at TSSs of unexpressed genes (fantom.gsc.riken.jp/5/) and housekeeping genes2 had the expected different pattern for B7 (a) and B13 (b).

Supplementary Figure 7 Distribution of the prediction consent.

Histogram of prediction consent in merged control (n=104) data. For the majority of genes, the prediction consent was above 95%; there are only a few genes with a prediction consent below 75%.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Note. (PDF 1610 kb)

Supplementary Table 1

List of the 100 most highly expressed genes based on plasma RNA-seq data provided by Koh et al. but predicted to be unexpressed (n = 12). (XLSX 11 kb)

Supplementary Table 2

List of the 1,000 most highly expressed genes based on plasma RNA-seq data provided by Koh et al. but predicted to be unexpressed (n = 245). (XLSX 18 kb)

Supplementary Table 3

Subsampling of sequencing data to establish a lower boundary of necessary sequencing coverage. (XLSX 10 kb)

Supplementary Table 4

Prediction and FPKM values for genes in focal amplifications having log2 ratio > 1 in breast cancer case B7. (XLSX 19 kb)

Supplementary Table 5

Prediction and FPKM values for genes in focal amplifications having log2 ratio > 1 in breast cancer case B13. (XLSX 25 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ulz, P., Thallinger, G., Auer, M. et al. Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat Genet 48, 1273–1278 (2016). https://doi.org/10.1038/ng.3648

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3648

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing