Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer

Abstract

Next-generation sequencing (NGS) is routinely applied in life sciences and clinical practice, but interpretation of the massive quantities of genomic data produced has become a critical challenge. The genome-wide mutation analyses enabled by NGS have had a revolutionary impact in revealing the predisposing and driving DNA alterations behind a multitude of disorders. The workflow to identify causative mutations from NGS data, for example in cancer and rare diseases, commonly involves phases such as quality filtering, case–control comparison, genome annotation, and visual validation, which require multiple processing steps and usage of various tools and scripts. To this end, we have introduced an interactive and user-friendly multi-platform-compatible software, BasePlayer, which allows scientists, regardless of bioinformatics training, to carry out variant analysis in disease genetics settings. A genome-wide scan of regulatory regions for mutation clusters can be carried out with a desktop computer in ~10 min with a dataset of 3 million somatic variants in 200 whole-genome-sequenced (WGS) cancers.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of the NGS data analysis capabilities and features of BasePlayer.
Fig. 2: The main window of BasePlayer, displaying three samples, a genomic region track and a population control data track.
Fig. 3: Variant Manager user interface and functions.
Fig. 4: Candidate genes in the result table and variant visualization.
Fig. 5: Somatic variants in the regulatory genome.
Fig. 6: Variant Manager settings in somatic cluster analysis.
Fig. 7: Somatic clustering results.
Fig. 8: Variant quality validation by read-level inspection.

Data availability

No previously unpublished data sets were generated or analyzed during the current study.

References

  1. 1.

    Koboldt, D. C., Steinberg, K. M., Larson, D. E., Wilson, R. K. & Mardis, E. R. The next-generation sequencing revolution and its impact on genomics. Cell 155, 27–38 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Boycott, K. M., Vanstone, M. R., Bulman, D. E. & MacKenzie, A. E. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat. Rev. Genet. 14, 681–691 (2013).

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Sabarinathan, R. et al. The whole-genome panorama of cancer drivers. Preprint at https://www.biorxiv.org/content/early/2017/09/20/190330 (2017).

  4. 4.

    Steensma, D. P. et al. Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126, 9–16 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Alioto, T. S. et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Donner, I. et al. Candidate susceptibility variants for esophageal squamous cell carcinoma. Genes Chromosomes Cancer 56, 453–459 (2017).

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Kondelin, J. et al. Comprehensive evaluation of protein coding mononucleotide microsatellites in microsatellite-unstable colorectal cancer. Cancer Res. 77, 4078–4088 (2017).

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    Hänninen, U. A. et al. Exome-wide somatic mutation characterization of small bowel adenocarcinoma. PLoS Genet. 14.3, e1007200 (2018).

    Article  Google Scholar 

  14. 14.

    Pradhan, B. et al. Detection of subclonal L1 transductions in colorectal cancer by long-distance inverse-PCR and Nanopore sequencing. Sci. Rep. 7, 14521 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Aavikko, M. et al. Loss of SUFU function in familial multiple meningioma. Am. J. Hum. Genet. 91, 520–526 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  Google Scholar 

  18. 18.

    Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015).

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Alston, C. L., Rocha, M. C., Lax, N. Z., Turnbull, D. M. & Taylor, R. W. The genetics and pathology of mitochondrial disease. J. Pathol. 241, 236–250 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Danecek, P. et al. The variant call format and VCF tools. Bioinformatics 27, 2156–2158 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Carver, T., Harris, S. R., Berriman, M., Parkhill, J. & McQuillan, J. A. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28, 464–469 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Fiume, M. et al. Savant Genome Browser 2: visualization and analysis for population-scale genomics. Nucleic Acids Res. 40, W615–W621 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Abeel, T., Van Parys, T., Saeys, Y., Galagan, J. & Van de Peer, Y. GenomeView: a next-generation genome browser. Nucleic Acids Res. 40, e12 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Wöste, M. & Dugas, M. VIPER: a web application for rapid expert review of variant calls. Bioinformatics 34, 1928-1929 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Kallio, M. A. et al. Chipster: user-friendly analysis software for microarray and other high-throughput data. BMC Genomics 12, 1 (2011).

    Article  Google Scholar 

  31. 31.

    Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Smedley, D. et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Fu, Y. et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Quang, D., Chen, Y. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Jagadeesh, K. A. et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat. Genet. 48, 1581–1586 (2016).

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

    CAS  Article  PubMed  Google Scholar 

  42. 42.

    Mathelier, A. et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, D110–D115 (2016).

    CAS  Article  PubMed  Google Scholar 

  43. 43.

    Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank T. Kivioja for his guidance in regard to the SELEX data and A. Ollikainen for the voice-over in the demonstration videos. We thank B. Pradhan and L. Kauppi for sharing their unpublished Nanopore data. We also thank M. Aavikko, L. van den Berg, D. Berta, O. Kilpivaara, J. Kondelin, H. Kuisma, Y. Li, M. Mehine, H. Metsola, J. Ravantti, L. Sipilä, T. Tanskanen, P. Vahteristo and N. Välimäki for testing BasePlayer and giving suggestions and additional support. We acknowledge ZeroTurnaround for creating the JRebel plugin for Eclipse (IDE). This work was supported by grants from the Biomedicum Helsinki Foundation; the Cancer Society of Finland; the Emil Aaltonen Foundation; the Juhani Aho Foundation for Medical Research; the Sigrid Juselius Foundation; the Academy of Finland (Finnish Center of Excellence Program 2012–2017, 250345); the European Research Council (ERC, 268648); a European Union Framework Programme 7 Collaborative Project (SYSCOL, 258236); the Nordic Information for Action eScience Center (NIASC); and a Nordic Center of Excellence grant financed by NordForsk (62721 to K.P.).

Author information

Affiliations

Authors

Contributions

R.K. designed and developed the protocol. R.K. and E.P. wrote the protocol. I.D. contributed to writing the protocol. I.D., T.C., E.K. and K.P. assisted in developing and testing the software. E.P., V.M. and L.A.A. supervised the research.

Corresponding authors

Correspondence to Riku Katainen or Esa Pitkänen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

1. Katainen, R. et al. Nat. Genet. 47, 818–821 (2015): https://doi.org/10.1038/ng.3335

2. Pradhan, B. et al. Sci. Rep. 7, 14521 (2017): https://doi.org/10.1038/s41598-017-15076-3

3. Donner, I. et al. Genes Chromosomes Cancer 56, 453–459 (2017): https://doi.org/10.1002/gcc.22448

Integrated supplementary information

Supplementary Figure 1 BasePlayer settings for variant analysis in recessive case.

A family trio and gnomAD exome control files are opened. Son (uppermost sample track) is set as an affected male. The parents are selected accordingly from the dropdown menus. “Recessive” checkbox is selected in “Inheritance” tab of the “Variant Manager”.

Supplementary Figure 2 Visualization of long-read sequencing data in BasePlayer.

Three split views are shown, tracking the split mappings for a single long read. An inset info panel shows information on the selected read and a schematic illustration of split read orientations (bottom of the info panel).

Supplementary Figure 3 TF binding affinity change prediction settings in BasePlayer.

(a) Affinity change annotation settings without variant filtering. (b) Affinity change annotation settings with variant filtering. Value limit is set to “1”. (c) Annotation results. Affinity change for each overlapping TFs are shown in the variant row of the result table (red circle). In the circled case, the variant occurs at the HOXD12 binding site, which has affinity score of 6.57 at that locus and variant decreases the binding affinity by 1.52. (d) TF motif and variant visualization at sequence level zoom. Affinity changes for each overlapping TFs are reported in “VCF info” dialog (bottom-right) if the track is applied and “report affinity change” is selected in the track settings.

Supplementary Figure 4 CADD prediction settings in BasePlayer.

(a) The column selector for TSV files. (b) Selected column headers for the CADD TSV file. (c) Track settings for the CADD annotation. (d) Annotation results. CADD annotation is shown in the variant row of the result table (red circle).

Supplementary Figure 5 Variant analysis steps in Procedure Case 1.

The effects of filtering, comparison and annotation on variant counts (top right corner of the Variant Manager) in chromosome 10. (a) Initial setup with no filters applied. (b) Quality and coverage filtering thresholds are set. (c) Only coding variants shared by all samples are visible. (d) Linkage compatible regions applied. Variants outside these regions are excluded. (e) Control file (gnomAD exomes) applied, resulting in one shared variant.

Supplementary Figure 6 M-CAP annotation settings in BasePlayer.

(a) Column selector for M-CAP file. Fourth column is set as “Base”. (b) Track settings for M-CAP track. Value limit is set to 0.025 and “Intersect” is unselected. “File format” button opens the “Column selector”.

Supplementary Figure 7 ClinVar annotation settings in BasePlayer.

“Annotation” checkbox is selected for the VCF track.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Table 1 and Supplementary Tutorials 1–5

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Katainen, R., Donner, I., Cajuso, T. et al. Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer. Nat Protoc 13, 2580–2600 (2018). https://doi.org/10.1038/s41596-018-0052-3

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing