Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

SeqControl: process control for DNA sequencing

Abstract

As high-throughput sequencing continues to increase in speed and throughput, routine clinical and industrial application draws closer. These 'production' settings will require enhanced quality monitoring and quality control to optimize output and reduce costs. We developed SeqControl, a framework for predicting sequencing quality and coverage using a set of 15 metrics describing overall coverage, coverage distribution, basewise coverage and basewise quality. Using whole-genome sequences of 27 prostate cancers and 26 normal references, we derived multivariate models that predict sequencing quality and depth. SeqControl robustly predicted how much sequencing was required to reach a given coverage depth (area under the curve (AUC) = 0.993), accurately classified clinically relevant formalin-fixed, paraffin-embedded samples, and made predictions from as little as one-eighth of a sequencing lane (AUC = 0.967). These techniques can be immediately incorporated into existing sequencing pipelines to monitor data quality in real time. SeqControl is available at http://labs.oicr.on.ca/Boutros-lab/software/SeqControl/.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Correlations between single-lane and all-lane values.
Figure 2: Summary of tumor linear models for all metrics.
Figure 3: A Random Forest classifier accurately predicts collapsed coverage.
Figure 4: Classification from partial-lane sequencing yields high predictive accuracy.

Similar content being viewed by others

References

  1. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  2. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).

    Article  CAS  Google Scholar 

  3. Lachance, J. et al. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers. Cell 150, 457–469 (2012).

    Article  CAS  Google Scholar 

  4. Prüfer, K. et al. The bonobo genome compared with the chimpanzee and human genomes. Nature 486, 527–531 (2012).

    Article  Google Scholar 

  5. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).

    Article  CAS  Google Scholar 

  6. Groenen, M.A.M. et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491, 393–398 (2012).

    Article  CAS  Google Scholar 

  7. D'Hont, A. et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488, 213–217 (2012).

    Article  CAS  Google Scholar 

  8. Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).

    Article  CAS  Google Scholar 

  9. Fan, H.C. et al. Non-invasive prenatal measurement of the fetal genome. Nature 487, 320–324 (2012).

    Article  CAS  Google Scholar 

  10. Hopf, T.A. et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149, 1607–1621 (2012).

    Article  CAS  Google Scholar 

  11. Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).

  12. Govindan, R. et al. Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 150, 1121–1134 (2012).

    Article  CAS  Google Scholar 

  13. Lupski, J.R. et al. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N. Engl. J. Med. 362, 1181–1191 (2010).

    Article  CAS  Google Scholar 

  14. Bras, J., Guerreiro, R. & Hardy, J. Use of next-generation sequencing and other whole-genome strategies to dissect neurological disease. Nat. Rev. Neurosci. 13, 453–464 (2012).

    Article  CAS  Google Scholar 

  15. Tran, B. et al. Feasibility of real time next generation sequencing of cancer genes linked to drug response: results from a clinical trial. Int. J. Cancer 132, 1547–1555 (2013).

    Article  CAS  Google Scholar 

  16. Wagle, N. et al. High-throughput detection of actionable genomic alterations in clinical tumor samples by targeted, massively parallel sequencing. Cancer Discov. 2, 82–93 (2012).

    Article  CAS  Google Scholar 

  17. Fu, G.K. et al. Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations. Proc. Natl. Acad. Sci. USA 111, 1891–1896 (2014).

    Article  CAS  Google Scholar 

  18. Clark, M.J. et al. Performance comparison of exome DNA sequencing technologies. Nat. Biotechnol. 29, 908–914 (2011).

    Article  CAS  Google Scholar 

  19. Frith, M.C., Wan, R. & Horton, P. Incorporating sequence quality data into alignment improves DNA read mapping. Nucleic Acids Res. 38, e100 (2010).

    Article  Google Scholar 

  20. Hower, V., Starfield, R., Roberts, A. & Pachter, L. Quantifying uniformity of mapped reads. Bioinformatics 28, 2680–2682 (2012).

    Article  CAS  Google Scholar 

  21. Ruffalo, M., Koyutürk, M., Ray, S. & LaFramboise, T. Accurate estimation of short read mapping quality for next-generation genome sequencing. Bioinformatics 28, i349–i355 (2012).

    Article  CAS  Google Scholar 

  22. Tae, H., Ryu, D., Sureshchandra, S. & Choi, J.-H. ESTclean: a cleaning tool for next-gen transcriptome shotgun sequencing. BMC Bioinformatics 13, 247 (2012).

    Article  Google Scholar 

  23. Daley, T. & Smith, A.D. Predicting the molecular complexity of sequencing libraries. Nat. Methods 10, 325–327 (2013).

    Article  CAS  Google Scholar 

  24. Lewis, F., Maughan, N.J., Smith, V., Hillan, K. & Quirke, P. Unlocking the archive—gene expression in paraffin-embedded tissue. J. Pathol. 195, 66–71 (2001).

    Article  CAS  Google Scholar 

  25. Lehmann, U. & Kreipe, H. Real-time PCR analysis of DNA and RNA extracted from formalin-fixed and paraffin-embedded biopsies. Methods 25, 409–418 (2001).

    Article  CAS  Google Scholar 

  26. International Cancer Genome Consortium. International network of cancer genome projects. Nature 464, 993–998 (2010).

  27. Fieller, E.C., Hartley, H.O. & Pearson, E.S. Tests for rank correlation coefficients. I. Biometrika 44, 470–481 (1957).

    Article  Google Scholar 

  28. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  29. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  Google Scholar 

  30. Chiu, R.W.K. et al. Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study. Br. Med. J. 342, c7401 (2011).

    Article  Google Scholar 

  31. Forshew, T. et al. Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. Sci. Transl. Med. 4, 136ra68 (2012).

    Article  Google Scholar 

  32. Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T. & Zeileis, A. Conditional variable importance for random forests. BMC Bioinformatics 9, 307 (2008).

    Article  Google Scholar 

  33. Song, S. et al. qpure: a tool to estimate tumor cellularity from genome-wide single-nucleotide polymorphism profiles. PLoS ONE 7, e45835 (2012).

    Article  CAS  Google Scholar 

  34. Fisher, S. et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol. 12, R1 (2011).

    Article  Google Scholar 

  35. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  36. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  Google Scholar 

  37. Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).

    Article  Google Scholar 

Download references

Acknowledgements

This study was conducted with the support of Movember funds through Prostate Cancer Canada and with the additional support of the Ontario Institute for Cancer Research, funded by the Government of Ontario; with the support of Genome Canada through a Large-Scale Applied Project contract to P.C.B., S. Shah and R. Morin; and with the support of Prostate Cancer Canada, funded by the Movember Foundation, grant #RS2014-01. P.C.B. was supported by a Terry Fox Research Institute New Investigator Award and a Canadian Institutes of Health Research New Investigator Award. The authors thank all members of the Boutros lab for their support and insightful discussions, J. Livingstone and L. Heisler for assistance in data handling, and L. Stein and J. Simpson for critical comments on the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

P.C.B. and L.C.C. initiated the project. M.F., T.v.d.K., R.G.B., J.D.M. and P.C.B. generated sequencing data. L.C.C., M.A.A., C.C., M.C.-S.-Y., R.d.B., R.E.D. and T.A.B. performed analysis on the sequencing data. L.C.C., M.A.A., C.C., N.J.H. and P.C.B. were responsible for statistical modeling. Research was supervised by R.G.B., J.D.M. and P.C.B. L.C.C. wrote the manuscript, which was edited and approved by all authors.

Corresponding author

Correspondence to Paul C Boutros.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–21, Supplementary Tables 2, 3, 11 and 15–18, Supplementary Note and Supplementary Results (PDF 9909 kb)

Supplementary Table 1

Information about the CPC-GENE sample data. (XLS 29 kb)

Supplementary Table 4

Metric values calculated for all lane groupings of the oldest ten samples (10 tumour, 9 matched normal). (XLSX 190 kb)

Supplementary Table 5

Metric values calculated for all lane groupings of the newest seventeen samples (tumour and matched normal). (XLSX 54 kb)

Supplementary Table 6

Metric data, true outcome and random forest-predicted outcome for all lane-level BAMs (tumour only). (XLSX 48 kb)

Supplementary Table 7

Metric data, true outcome and random forest-predicted outcome for all lane-level BAMs (normal only). (XLSX 34 kb)

Supplementary Table 8

Metric data, true outcome and random forest-predicted outcome for all half-lane BAMs (tumour only). (XLSX 87 kb)

Supplementary Table 9

Metric data, true outcome and random forest-predicted outcome for all quarter-lane BAMs (tumour only). (XLSX 160 kb)

Supplementary Table 10

Metric data, true outcome and random forest-predicted outcome for all eighth-lane BAMs (tumour only). (XLSX 308 kb)

Supplementary Table 12

Summary of preseq v0.0.1 prediction accuracy. (XLSX 25 kb)

Supplementary Table 13

Summary of preseq v0.1.0 prediction accuracy. (XLSX 25 kb)

Supplementary Table 14

Summary of preseq v1.0.0 prediction accuracy. (XLSX 25 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chong, L., Albuquerque, M., Harding, N. et al. SeqControl: process control for DNA sequencing. Nat Methods 11, 1071–1075 (2014). https://doi.org/10.1038/nmeth.3094

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3094

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics