Next-generation sequencing (NGS) is routinely applied in life sciences and clinical practice, but interpretation of the massive quantities of genomic data produced has become a critical challenge. The genome-wide mutation analyses enabled by NGS have had a revolutionary impact in revealing the predisposing and driving DNA alterations behind a multitude of disorders. The workflow to identify causative mutations from NGS data, for example in cancer and rare diseases, commonly involves phases such as quality filtering, case–control comparison, genome annotation, and visual validation, which require multiple processing steps and usage of various tools and scripts. To this end, we have introduced an interactive and user-friendly multi-platform-compatible software, BasePlayer, which allows scientists, regardless of bioinformatics training, to carry out variant analysis in disease genetics settings. A genome-wide scan of regulatory regions for mutation clusters can be carried out with a desktop computer in ~10 min with a dataset of 3 million somatic variants in 200 whole-genome-sequenced (WGS) cancers.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
No previously unpublished data sets were generated or analyzed during the current study.
Koboldt, D. C., Steinberg, K. M., Larson, D. E., Wilson, R. K. & Mardis, E. R. The next-generation sequencing revolution and its impact on genomics. Cell 155, 27–38 (2013).
Boycott, K. M., Vanstone, M. R., Bulman, D. E. & MacKenzie, A. E. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat. Rev. Genet. 14, 681–691 (2013).
Sabarinathan, R. et al. The whole-genome panorama of cancer drivers. Preprint at https://www.biorxiv.org/content/early/2017/09/20/190330 (2017).
Steensma, D. P. et al. Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126, 9–16 (2015).
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Alioto, T. S. et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015).
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Donner, I. et al. Candidate susceptibility variants for esophageal squamous cell carcinoma. Genes Chromosomes Cancer 56, 453–459 (2017).
Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).
Kondelin, J. et al. Comprehensive evaluation of protein coding mononucleotide microsatellites in microsatellite-unstable colorectal cancer. Cancer Res. 77, 4078–4088 (2017).
Hänninen, U. A. et al. Exome-wide somatic mutation characterization of small bowel adenocarcinoma. PLoS Genet. 14.3, e1007200 (2018).
Pradhan, B. et al. Detection of subclonal L1 transductions in colorectal cancer by long-distance inverse-PCR and Nanopore sequencing. Sci. Rep. 7, 14521 (2017).
Aavikko, M. et al. Loss of SUFU function in familial multiple meningioma. Am. J. Hum. Genet. 91, 520–526 (2012).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015).
Alston, C. L., Rocha, M. C., Lax, N. Z., Turnbull, D. M. & Taylor, R. W. The genetics and pathology of mitochondrial disease. J. Pathol. 241, 236–250 (2017).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Danecek, P. et al. The variant call format and VCF tools. Bioinformatics 27, 2156–2158 (2011).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2009).
Carver, T., Harris, S. R., Berriman, M., Parkhill, J. & McQuillan, J. A. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28, 464–469 (2011).
Fiume, M. et al. Savant Genome Browser 2: visualization and analysis for population-scale genomics. Nucleic Acids Res. 40, W615–W621 (2012).
Abeel, T., Van Parys, T., Saeys, Y., Galagan, J. & Van de Peer, Y. GenomeView: a next-generation genome browser. Nucleic Acids Res. 40, e12 (2011).
Wöste, M. & Dugas, M. VIPER: a web application for rapid expert review of variant calls. Bioinformatics 34, 1928-1929 (2018).
Kallio, M. A. et al. Chipster: user-friendly analysis software for microarray and other high-throughput data. BMC Genomics 12, 1 (2011).
Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016).
Smedley, D. et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015 (2015).
Fu, Y. et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Quang, D., Chen, Y. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2014).
Jagadeesh, K. A. et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat. Genet. 48, 1581–1586 (2016).
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).
Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).
Mathelier, A. et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, D110–D115 (2016).
Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009).
We thank T. Kivioja for his guidance in regard to the SELEX data and A. Ollikainen for the voice-over in the demonstration videos. We thank B. Pradhan and L. Kauppi for sharing their unpublished Nanopore data. We also thank M. Aavikko, L. van den Berg, D. Berta, O. Kilpivaara, J. Kondelin, H. Kuisma, Y. Li, M. Mehine, H. Metsola, J. Ravantti, L. Sipilä, T. Tanskanen, P. Vahteristo and N. Välimäki for testing BasePlayer and giving suggestions and additional support. We acknowledge ZeroTurnaround for creating the JRebel plugin for Eclipse (IDE). This work was supported by grants from the Biomedicum Helsinki Foundation; the Cancer Society of Finland; the Emil Aaltonen Foundation; the Juhani Aho Foundation for Medical Research; the Sigrid Juselius Foundation; the Academy of Finland (Finnish Center of Excellence Program 2012–2017, 250345); the European Research Council (ERC, 268648); a European Union Framework Programme 7 Collaborative Project (SYSCOL, 258236); the Nordic Information for Action eScience Center (NIASC); and a Nordic Center of Excellence grant financed by NordForsk (62721 to K.P.).
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
1. Katainen, R. et al. Nat. Genet. 47, 818–821 (2015): https://doi.org/10.1038/ng.3335
2. Pradhan, B. et al. Sci. Rep. 7, 14521 (2017): https://doi.org/10.1038/s41598-017-15076-3
3. Donner, I. et al. Genes Chromosomes Cancer 56, 453–459 (2017): https://doi.org/10.1002/gcc.22448
Integrated supplementary information
A family trio and gnomAD exome control files are opened. Son (uppermost sample track) is set as an affected male. The parents are selected accordingly from the dropdown menus. “Recessive” checkbox is selected in “Inheritance” tab of the “Variant Manager”.
Three split views are shown, tracking the split mappings for a single long read. An inset info panel shows information on the selected read and a schematic illustration of split read orientations (bottom of the info panel).
(a) Affinity change annotation settings without variant filtering. (b) Affinity change annotation settings with variant filtering. Value limit is set to “1”. (c) Annotation results. Affinity change for each overlapping TFs are shown in the variant row of the result table (red circle). In the circled case, the variant occurs at the HOXD12 binding site, which has affinity score of 6.57 at that locus and variant decreases the binding affinity by 1.52. (d) TF motif and variant visualization at sequence level zoom. Affinity changes for each overlapping TFs are reported in “VCF info” dialog (bottom-right) if the track is applied and “report affinity change” is selected in the track settings.
(a) The column selector for TSV files. (b) Selected column headers for the CADD TSV file. (c) Track settings for the CADD annotation. (d) Annotation results. CADD annotation is shown in the variant row of the result table (red circle).
The effects of filtering, comparison and annotation on variant counts (top right corner of the Variant Manager) in chromosome 10. (a) Initial setup with no filters applied. (b) Quality and coverage filtering thresholds are set. (c) Only coding variants shared by all samples are visible. (d) Linkage compatible regions applied. Variants outside these regions are excluded. (e) Control file (gnomAD exomes) applied, resulting in one shared variant.
(a) Column selector for M-CAP file. Fourth column is set as “Base”. (b) Track settings for M-CAP track. Value limit is set to 0.025 and “Intersect” is unselected. “File format” button opens the “Column selector”.
“Annotation” checkbox is selected for the VCF track.
About this article
Cite this article
Katainen, R., Donner, I., Cajuso, T. et al. Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer. Nat Protoc 13, 2580–2600 (2018). https://doi.org/10.1038/s41596-018-0052-3
Genes, Chromosomes and Cancer (2021)
Molecular Genetics & Genomic Medicine (2021)
Thyroid Carcinomas That Occur in Familial Adenomatous Polyposis Patients Recurrently Harbor Somatic Variants inAPC,BRAF, andKTM2D
Journal of Human Genetics (2020)
Nature Medicine (2020)