Accurate identification of single-nucleotide variants in whole-genome-amplified single cells

Dong, Xiao; Zhang, Lei; Milholland, Brandon; Lee, Moonsook; Maslov, Alexander Y; Wang, Tao; Vijg, Jan

doi:10.1038/nmeth.4227

Brief Communication
Published: 20 March 2017

Accurate identification of single-nucleotide variants in whole-genome-amplified single cells

Xiao Dong¹^na1,
Lei Zhang¹^na1,
Brandon Milholland¹^na1,
Moonsook Lee¹,
Alexander Y Maslov¹,
Tao Wang² &
…
Jan Vijg^1,3

Nature Methods volume 14, pages 491–493 (2017)Cite this article

11k Accesses
131 Citations
317 Altmetric
Metrics details

Subjects

A Corrigendum to this article was published on 01 December 2017

This article has been updated

Abstract

Mutation analysis in single-cell genomes is prone to artifacts associated with cell lysis and whole-genome amplification. Here we addressed these issues by developing single-cell multiple displacement amplification (SCMDA) and a general-purpose single-cell-variant caller, SCcaller (https://github.com/biosinodx/SCcaller/). By comparing SCMDA-amplified single cells with unamplified clones from the same population, we validated the procedure as a firm foundation for standardized somatic-mutation analysis in single-cell genomics.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Experimental design for validating SNV identification in SCMDA-amplified single cells.**

**Figure 2: Accuracy of SCcaller in single-cell SNV calling.**

**Figure 3: Frequency, spectrum and distribution of somatic SNVs.**

Analyzing somatic mutations by single-cell whole-genome sequencing

Article 23 November 2023

Lei Zhang, Moonsook Lee, … Xiao Dong

Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo

Article Open access 18 November 2021

David Lähnemann, Johannes Köster, … Alexander Schönhuth

Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance

Article Open access 29 August 2019

Lovelace J. Luquette, Craig L. Bohrson, … Peter J. Park

Accession codes

Primary accessions

Sequence Read Archive

SRP067062

Referenced accessions

Sequence Read Archive

Change history

13 October 2017
In the version of this article initially published, Lodato, M.A. et al. Science 350, 94–98 (2015) (reference 2) was cited as an example of a single-cell sequencing study with high CG-to-TA transitions that applies heat lysis. However, that work used alkaline lysis on ice (Walsh, C.A. and Lodato, M.A., personal communication); therefore, we have changed the third sentence of the paper from "This pathway may explain the observed excess of such mutations in single neurons² compared with unamplified neuronal clones³" to "Amplification artifacts could, in general, explain the observed excess of such mutations in single neurons² compared with unamplified DNA from neuronal clones³." The error has been corrected in the HTML and PDF versions of the article.
01 December 2017
Nat. Methods 14, 491–493 (2017); published online 20 March 2017; corrected after print 13 October 2017 In the version of this article initially published, Lodato, M.A. et al. Science 350, 94–98 (2015) (reference 2) was cited as an example of a single-cell sequencing study with high CG-to-TA transitions that applies heat lysis.

References

Fryxell, K.J. & Zuckerkandl, E. Mol. Biol. Evol. 17, 1371–1383 (2000).
Article CAS Google Scholar
Lodato, M.A. et al. Science 350, 94–98 (2015).
Article CAS Google Scholar
Hazen, J.L. et al. Neuron 89, 1223–1236 (2016).
Article CAS Google Scholar
Lasken, R.S. Biochem. Soc. Trans. 37, 450–453 (2009).
Article CAS Google Scholar
Gundry, M., Li, W., Maqbool, S.B. & Vijg, J. Nucleic Acids Res. 40, 2032–2040 (2012).
Article CAS Google Scholar
Fu, Y. et al. Proc. Natl. Acad. Sci. USA 112, 11923–11928 (2015).
Article CAS Google Scholar
Zong, C., Lu, S., Chapman, A.R. & Xie, X.S. Science 338, 1622–1626 (2012).
Article CAS Google Scholar
McKenna, A. et al. Genome Res. 20, 1297–1303 (2010).
Article CAS Google Scholar
Zafar, H., Wang, Y., Nakhleh, L., Navin, N. & Chen, K. Nat. Methods 13, 505–507 (2016).
Article CAS Google Scholar
Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).
Article CAS Google Scholar
Koboldt, D.C. et al. Genome Res. 22, 568–576 (2012).
Article CAS Google Scholar
Behjati, S. et al. Nature 513, 422–425 (2014).
Article CAS Google Scholar
Hanawalt, P.C. & Spivak, G. Nat. Rev. Mol. Cell Biol. 9, 958–970 (2008).
Article CAS Google Scholar
Gundry, M. & Vijg, J. Mutat. Res. 729, 1–15 (2012).
Article CAS Google Scholar
Dong, X. et al. Protocol Exchange http://dx.doi.org/10.1038/protex.2017.061 (2017).
Park, C.H. et al. J. Invest. Dermatol. 123, 1012–1019 (2004).
Article CAS Google Scholar
Falanga, V. et al. J. Invest. Dermatol. 105, 27–31 (1995).
Article CAS Google Scholar
Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).
Article CAS Google Scholar
Abecasis, G.R. et al. Nature 491, 56–65 (2012).
PubMed Google Scholar

Download references

Acknowledgements

This research was supported by the NIH (grants AG017242, AG047200 and AG038072 to J.V.) and the Glenn Foundation for Medical Research (J.V.). We thank H. Choi (Seoul National University) for providing materials.

Author information

Xiao Dong, Lei Zhang and Brandon Milholland: These authors contributed equally to this work.

Authors and Affiliations

Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
Xiao Dong, Lei Zhang, Brandon Milholland, Moonsook Lee, Alexander Y Maslov & Jan Vijg
Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, New York, USA
Tao Wang
Department of Ophthalmology & Visual Sciences, Albert Einstein College of Medicine, Bronx, New York, USA
Jan Vijg

Authors

Xiao Dong
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Brandon Milholland
View author publications
You can also search for this author in PubMed Google Scholar
Moonsook Lee
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Y Maslov
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jan Vijg
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.V., L.Z. and X.D. conceived this study and designed the experiments. L.Z., B.M. and M.L. performed the experiments. X.D. developed the software. X.D. and T.W. analyzed the data. J.V., X.D., L.Z., B.M. T.W. and A.Y.M. wrote the manuscript.

Corresponding author

Correspondence to Jan Vijg.

Ethics declarations

Competing interests

X.D., L.Z., M.L., A.M. and J.V. are cofounders of SingulOmics Corp.

Integrated supplementary information

Supplementary Figure 1 Schematic representation of enrichment of amplification errors through allelic bias.

(a) When there is no allelic amplification bias, amplification errors occurring in the first round of amplification (worst case scenario) would appear in only 12.5% of the sequencing reads, corresponding to 1 out of 8 strands when the error occurs during the first round of amplification of a diploid genome; a true heterozygous SNV (or SNP) would appear in about 50% of the reads. (b) When there is allelic amplification bias, the fraction of reads carrying the amplification error is enriched and could easily affect 50% or more of the reads.

Supplementary Figure 2 Isolation of a single cell by using the CellRaft system.

(a) The raft left of number “0804” contains one single fibroblast (red circle). (b) The target raft containing the single cell next to “0804” was collected into a PCR tube using the collection wand. This left the well empty (red circle). The two scratches in the target raft are caused by the needle of the release device, a part of the CellRaft system. The needle was used to dislodge the raft containing the cell from the CellRaft array. This system essentially precludes the capture of more than one cell.

Supplementary Figure 3 Validation of raft number in the PCR tube.

(a) A 0.2-ml PCR tube with 2.5 μl PBS containing one Raft collected by the CellRaft system (magnified). The small brown dot at the bottom of the PCR tube is the raft (arrow). (b) Two rafts (arrows) were collected into the same PCR tube. This is very clear from the two magnified brown dots in the same PCR. Using this system it is easy to visually check if more than one cell was captured in the same tube.

Supplementary Figure 4 Lorenz curves to assess coverage uniformity.

The diagonal line represents completely uniform amplification. Whole genome amplified cells showed modest locus bias as compared to unamplified bulk DNA.

Supplementary Figure 5 Allelic fraction of heterozygous SNPs as a percentage of sequencing reads affected in kindred clone and single cells.

We took hSNPs identified from the unamplified kindred clone (IL1C) in a randomly selected region (chr1:100,000,000-110,000,000), and examined what percentage of the sequencing reads in the two kindred single cells (IL11 and IL12) contained these hSNPs. Ideally, this should be 50% of the reads containing the variant for all hSNPs tested. Allelic fractions of these hSNPs were plotted as percentage of the sequencing reads for kindred cell IL11 (b), IL12 (c) and IL1C itself (a). There is little bias in the clone, moderate bias in cell IL11 and severe bias in cell IL12. “Densities” of y-axis are kernel density estimates using “density” function in R software with default parameters.

Supplementary Figure 6 Correlation of observed and estimated major allele fraction by ‘leave-100-out cross-validations’.

The cross-validations were performed with cell IL11 and IL12 with two λ, half width of window for smoothing, settings (a-d) for a randomly selected region chr1:100,000,000-110,000,000. The λ denotes half of window width for smoothing (Methods). Amplification bias of IL12 was more correctly predicted than that of IL11 because there is more amplification bias in IL12; hence, known heterozygous SNPs of IL12 are more bias informative.

Supplementary Figure 7 Steps in variant calling by SCcaller.

First, local allelic bias is estimated using a kernel smoother based on hSNPs. Second, SNVs are identified using a likelihood ratio test. Third, SNVs not present in the bulk alignment are designated as true somatic SNVs rather than SNPs.

Supplementary Figure 8 Distribution of log likelihood ratios of variant-calling models.

The ratio between the likelihoods of the null model (L₀) and of the heterozygous SNV model (L₁) was plotted. The black dashed line indicates cutoff criteria for log likelihood ratios. The cutoff corresponds to an alpha level of 0.01 (dashed line) using a likelihood ratio test. These results indicate that using this test and its criteria, we are able to separate real mutations from artifacts. Real mutations and amplification artifacts were determined by comparing data from single cells to their kindred clone (Fig. 1b). “Densities” of y-axis are kernel density estimates using “density” function in R software with default parameters.

Supplementary Figure 9 Spectra and numbers of somatic SNVs identified with different variant callers.

Mutation spectra of candidate somatic SNVs called by (a) Monovar; (b) MuTect; and (c) VarScan. The results from SCcaller are shown in Fig. 3b The error bars indicate standard deviations. (d) The number of candidate somatic SNVs per cell called by each variant caller. The error bars indicate standard deviations. Sample size n=4, 6 and 2 for the clones, SCMDA and HighTemp MDA respectively.

Supplementary Figure 10 Somatic SNVs in the functional genome.

(a) Enrichment and depletion of somatic SNVs in genomic features. The asterisks indicate a significant depletion (P < 0.01, two-tailed pair-wise t test) compared to the genome average. Data on germline polymorphisms were obtained from the 1000 Genomes Project. The error bars indicate standard deviations. Sample size n=10 for the somatic SNVs, e.g. including 4 clones and 6 single-cells. (b) Mutant genes are less highly expressed than wildtype genes. Average FPKM (Fragments Per Kilobase Of Exon Per Million Fragments Mapped) value (red dashed line) of genes affected by somatic SNVs in their exon regions was compared with the average FPKM values (black line) of 2,000 random gene sets (same number of genes as the mutant gene set). P value was calculated from the permutation (one-tailed). The RNA sequencing data were downloaded from the ENCODE project (ID: ENCFF640FPG and ENCFF704TVE). “Densities” of y-axis are kernel density estimates using “density” function in R software with default parameters.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10, Supplementary Tables 1–5 and Supplementary Note (PDF 1473 kb)

Supplementary Software

SCcaller (version 1.0) (ZIP 22 kb)

Supplementary Protocol

Single-Cell Multiple Displacement Amplification (SCMDA) Protocol (PDF 143 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, X., Zhang, L., Milholland, B. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat Methods 14, 491–493 (2017). https://doi.org/10.1038/nmeth.4227

Download citation

Received: 17 June 2016
Accepted: 22 February 2017
Published: 20 March 2017
Issue Date: May 2017
DOI: https://doi.org/10.1038/nmeth.4227

This article is cited by

Analyzing somatic mutations by single-cell whole-genome sequencing
- Lei Zhang
- Moonsook Lee
- Xiao Dong
Nature Protocols (2024)
Computational immunogenomic approaches to predict response to cancer immunotherapies
- Venkateswar Addala
- Felicity Newell
- Nicola Waddell
Nature Reviews Clinical Oncology (2024)
Somatic mutations in aging and disease
- Peijun Ren
- Jie Zhang
- Jan Vijg
GeroScience (2024)
Single-cell lineage tracing with endogenous markers
- Yan Xue
- Zezhuo Su
- Ken H. O. Yu
Biophysical Reviews (2024)
Completing a genomic characterisation of microscopic tumour samples with copy number
- Joel Nulsen
- Nosheen Hussain
- Ahmed Ashour Ahmed
BMC Bioinformatics (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

Sequence Read Archive

Change history

13 October 2017

01 December 2017

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links