Accurate identification of single-nucleotide variants in whole-genome-amplified single cells

  • A Corrigendum to this article was published on 01 December 2017

Abstract

Mutation analysis in single-cell genomes is prone to artifacts associated with cell lysis and whole-genome amplification. Here we addressed these issues by developing single-cell multiple displacement amplification (SCMDA) and a general-purpose single-cell-variant caller, SCcaller (https://github.com/biosinodx/SCcaller/). By comparing SCMDA-amplified single cells with unamplified clones from the same population, we validated the procedure as a firm foundation for standardized somatic-mutation analysis in single-cell genomics.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Experimental design for validating SNV identification in SCMDA-amplified single cells.
Figure 2: Accuracy of SCcaller in single-cell SNV calling.
Figure 3: Frequency, spectrum and distribution of somatic SNVs.

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

Sequence Read Archive

Change history

  • 13 October 2017

    In the version of this article initially published, Lodato, M.A. et al. Science 350, 94–98 (2015) (reference 2) was cited as an example of a single-cell sequencing study with high CG-to-TA transitions that applies heat lysis. However, that work used alkaline lysis on ice (Walsh, C.A. and Lodato, M.A., personal communication); therefore, we have changed the third sentence of the paper from "This pathway may explain the observed excess of such mutations in single neurons2 compared with unamplified neuronal clones3" to "Amplification artifacts could, in general, explain the observed excess of such mutations in single neurons2 compared with unamplified DNA from neuronal clones3." The error has been corrected in the HTML and PDF versions of the article.

  • 01 December 2017

    Nat. Methods 14, 491–493 (2017); published online 20 March 2017; corrected after print 13 October 2017 In the version of this article initially published, Lodato, M.A. et al. Science 350, 94–98 (2015) (reference 2) was cited as an example of a single-cell sequencing study with high CG-to-TA transitions that applies heat lysis.

References

  1. 1

    Fryxell, K.J. & Zuckerkandl, E. Mol. Biol. Evol. 17, 1371–1383 (2000).

    CAS  Article  Google Scholar 

  2. 2

    Lodato, M.A. et al. Science 350, 94–98 (2015).

    CAS  Article  Google Scholar 

  3. 3

    Hazen, J.L. et al. Neuron 89, 1223–1236 (2016).

    CAS  Article  Google Scholar 

  4. 4

    Lasken, R.S. Biochem. Soc. Trans. 37, 450–453 (2009).

    CAS  Article  Google Scholar 

  5. 5

    Gundry, M., Li, W., Maqbool, S.B. & Vijg, J. Nucleic Acids Res. 40, 2032–2040 (2012).

    CAS  Article  Google Scholar 

  6. 6

    Fu, Y. et al. Proc. Natl. Acad. Sci. USA 112, 11923–11928 (2015).

    CAS  Article  Google Scholar 

  7. 7

    Zong, C., Lu, S., Chapman, A.R. & Xie, X.S. Science 338, 1622–1626 (2012).

    CAS  Article  Google Scholar 

  8. 8

    McKenna, A. et al. Genome Res. 20, 1297–1303 (2010).

    CAS  Article  Google Scholar 

  9. 9

    Zafar, H., Wang, Y., Nakhleh, L., Navin, N. & Chen, K. Nat. Methods 13, 505–507 (2016).

    CAS  Article  Google Scholar 

  10. 10

    Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).

    CAS  Article  Google Scholar 

  11. 11

    Koboldt, D.C. et al. Genome Res. 22, 568–576 (2012).

    CAS  Article  Google Scholar 

  12. 12

    Behjati, S. et al. Nature 513, 422–425 (2014).

    CAS  Article  Google Scholar 

  13. 13

    Hanawalt, P.C. & Spivak, G. Nat. Rev. Mol. Cell Biol. 9, 958–970 (2008).

    CAS  Article  Google Scholar 

  14. 14

    Gundry, M. & Vijg, J. Mutat. Res. 729, 1–15 (2012).

    CAS  Article  Google Scholar 

  15. 15

    Dong, X. et al. Protocol Exchange http://dx.doi.org/10.1038/protex.2017.061 (2017).

  16. 16

    Park, C.H. et al. J. Invest. Dermatol. 123, 1012–1019 (2004).

    CAS  Article  Google Scholar 

  17. 17

    Falanga, V. et al. J. Invest. Dermatol. 105, 27–31 (1995).

    CAS  Article  Google Scholar 

  18. 18

    Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).

    CAS  Article  Google Scholar 

  19. 19

    Abecasis, G.R. et al. Nature 491, 56–65 (2012).

    PubMed  Google Scholar 

Download references

Acknowledgements

This research was supported by the NIH (grants AG017242, AG047200 and AG038072 to J.V.) and the Glenn Foundation for Medical Research (J.V.). We thank H. Choi (Seoul National University) for providing materials.

Author information

Affiliations

Authors

Contributions

J.V., L.Z. and X.D. conceived this study and designed the experiments. L.Z., B.M. and M.L. performed the experiments. X.D. developed the software. X.D. and T.W. analyzed the data. J.V., X.D., L.Z., B.M. T.W. and A.Y.M. wrote the manuscript.

Corresponding author

Correspondence to Jan Vijg.

Ethics declarations

Competing interests

X.D., L.Z., M.L., A.M. and J.V. are cofounders of SingulOmics Corp.

Integrated supplementary information

Supplementary Figure 1 Schematic representation of enrichment of amplification errors through allelic bias.

(a) When there is no allelic amplification bias, amplification errors occurring in the first round of amplification (worst case scenario) would appear in only 12.5% of the sequencing reads, corresponding to 1 out of 8 strands when the error occurs during the first round of amplification of a diploid genome; a true heterozygous SNV (or SNP) would appear in about 50% of the reads. (b) When there is allelic amplification bias, the fraction of reads carrying the amplification error is enriched and could easily affect 50% or more of the reads.

Supplementary Figure 2 Isolation of a single cell by using the CellRaft system.

(a) The raft left of number “0804” contains one single fibroblast (red circle). (b) The target raft containing the single cell next to “0804” was collected into a PCR tube using the collection wand. This left the well empty (red circle). The two scratches in the target raft are caused by the needle of the release device, a part of the CellRaft system. The needle was used to dislodge the raft containing the cell from the CellRaft array. This system essentially precludes the capture of more than one cell.

Supplementary Figure 3 Validation of raft number in the PCR tube.

(a) A 0.2-ml PCR tube with 2.5 μl PBS containing one Raft collected by the CellRaft system (magnified). The small brown dot at the bottom of the PCR tube is the raft (arrow). (b) Two rafts (arrows) were collected into the same PCR tube. This is very clear from the two magnified brown dots in the same PCR. Using this system it is easy to visually check if more than one cell was captured in the same tube.

Supplementary Figure 4 Lorenz curves to assess coverage uniformity.

The diagonal line represents completely uniform amplification. Whole genome amplified cells showed modest locus bias as compared to unamplified bulk DNA.

Supplementary Figure 5 Allelic fraction of heterozygous SNPs as a percentage of sequencing reads affected in kindred clone and single cells.

We took hSNPs identified from the unamplified kindred clone (IL1C) in a randomly selected region (chr1:100,000,000-110,000,000), and examined what percentage of the sequencing reads in the two kindred single cells (IL11 and IL12) contained these hSNPs. Ideally, this should be 50% of the reads containing the variant for all hSNPs tested. Allelic fractions of these hSNPs were plotted as percentage of the sequencing reads for kindred cell IL11 (b), IL12 (c) and IL1C itself (a). There is little bias in the clone, moderate bias in cell IL11 and severe bias in cell IL12. “Densities” of y-axis are kernel density estimates using “density” function in R software with default parameters.

Supplementary Figure 6 Correlation of observed and estimated major allele fraction by ‘leave-100-out cross-validations’.

The cross-validations were performed with cell IL11 and IL12 with two λ, half width of window for smoothing, settings (a-d) for a randomly selected region chr1:100,000,000-110,000,000. The λ denotes half of window width for smoothing (Methods). Amplification bias of IL12 was more correctly predicted than that of IL11 because there is more amplification bias in IL12; hence, known heterozygous SNPs of IL12 are more bias informative.

Supplementary Figure 7 Steps in variant calling by SCcaller.

First, local allelic bias is estimated using a kernel smoother based on hSNPs. Second, SNVs are identified using a likelihood ratio test. Third, SNVs not present in the bulk alignment are designated as true somatic SNVs rather than SNPs.

Supplementary Figure 8 Distribution of log likelihood ratios of variant-calling models.

The ratio between the likelihoods of the null model (L0) and of the heterozygous SNV model (L1) was plotted. The black dashed line indicates cutoff criteria for log likelihood ratios. The cutoff corresponds to an alpha level of 0.01 (dashed line) using a likelihood ratio test. These results indicate that using this test and its criteria, we are able to separate real mutations from artifacts. Real mutations and amplification artifacts were determined by comparing data from single cells to their kindred clone (Fig. 1b). “Densities” of y-axis are kernel density estimates using “density” function in R software with default parameters.

Supplementary Figure 9 Spectra and numbers of somatic SNVs identified with different variant callers.

Mutation spectra of candidate somatic SNVs called by (a) Monovar; (b) MuTect; and (c) VarScan. The results from SCcaller are shown in Fig. 3b The error bars indicate standard deviations. (d) The number of candidate somatic SNVs per cell called by each variant caller. The error bars indicate standard deviations. Sample size n=4, 6 and 2 for the clones, SCMDA and HighTemp MDA respectively.

Supplementary Figure 10 Somatic SNVs in the functional genome.

(a) Enrichment and depletion of somatic SNVs in genomic features. The asterisks indicate a significant depletion (P < 0.01, two-tailed pair-wise t test) compared to the genome average. Data on germline polymorphisms were obtained from the 1000 Genomes Project. The error bars indicate standard deviations. Sample size n=10 for the somatic SNVs, e.g. including 4 clones and 6 single-cells. (b) Mutant genes are less highly expressed than wildtype genes. Average FPKM (Fragments Per Kilobase Of Exon Per Million Fragments Mapped) value (red dashed line) of genes affected by somatic SNVs in their exon regions was compared with the average FPKM values (black line) of 2,000 random gene sets (same number of genes as the mutant gene set). P value was calculated from the permutation (one-tailed). The RNA sequencing data were downloaded from the ENCODE project (ID: ENCFF640FPG and ENCFF704TVE). “Densities” of y-axis are kernel density estimates using “density” function in R software with default parameters.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10, Supplementary Tables 1–5 and Supplementary Note (PDF 1473 kb)

Supplementary Software

SCcaller (version 1.0) (ZIP 22 kb)

Supplementary Protocol

Single-Cell Multiple Displacement Amplification (SCMDA) Protocol (PDF 143 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Zhang, L., Milholland, B. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat Methods 14, 491–493 (2017). https://doi.org/10.1038/nmeth.4227

Download citation

Further reading