Single-cell DNA and RNA sequencing of circulating tumor cells

Single-cell sequencing of circulating tumor cells can precisely represent tumor heterogeneity and provide useful information for cancer treatment and research. After spiking TGW neuroblastoma cells into blood derived from healthy volunteer, the cells were isolated by fluorescence-activated cell sorting. DNA and mRNA were amplified by four different whole-genome amplifications (WGA) and three whole-transcriptome amplifications (WTA) methods, followed by single-cell DNA and RNA sequencing. Multiple displacement amplification (MDA)-based WGA methods showed higher amplification efficiency than other methods with a comparable depth of coverage as the bulk sample. The uniformity of coverage greatly differed among samples (12.5–89.2%), with some samples evaluated by the MDA-based WGA method using phi29 DNA polymerase and random primers showing a high (> 80%) uniformity of coverage. The MDA-based WTA method less effectively amplified mRNA and showed non-specific gene expression patterns. The PCR-based WTA using template switching with locked nucleic acid technology accurately amplified mRNA from a single cell. Taken together, our results present a more reliable and adaptable approach for CTC profiling at the single-cell level. Such molecular information on CTCs derived from clinical patients will promote cancer treatment and research.

Method X: PCR-based, template switching and LNA technology (n = 8) Method Y: PCR-based, template switching (n = 8) Method Z: MDA-based, oligo dT primer and phi29 DNA polymerase (n = 8) Figure 1. Experimental design. TGW neuroblastoma cells were spiked into blood derived from a healthy volunteer and isolated by fluorescence-activated cell sorting (FACS) using anti-GD2 and CD90 antibodies. We amplified DNA and mRNA from isolated single TGW cells using several different WGA and WTA methods. We performed a quality check (QC) of the WGA samples by multiplex qPCR of eight cancer-related genes, and QC criteria was set as threshold cycles (Ct values) ≤ 30 in all the genes. Next, we performed single-cell sequencing and evaluated the depth and uniformity of coverage for WGA samples, the number of sequenced reads from transcripts and gene expression patterns for WTA samples. FACS fluorescence-activated cell sorting, WGA whole-genome amplification, WTA whole-transcriptome amplification, PCR polymerase chain reaction, MDA multiple displacement amplification, LNA locked nucleic acid technology. www.nature.com/scientificreports/ (mean ± SD, n = 7) by method C (MDA-based), and 0.17 ± 0.004 µg (mean ± SD, n = 7) by method D (PCR with MDA hybridized). Method C (MDA-based) showed high productivity, but the amounts of WGA samples obtained by method D (PCR with MDA hybridized) were too low to be evaluated by multiplex qPCR for QC (Fig. 2a).
Quality of WGA sample. Seven of eight samples passed QC in method A (PCR-based), all eight samples passed QC in method B (MDA-based), and six of seven samples passed QC in method C (MDA-based). The Ct values of the eight genes and their deviation were increased in WGA samples, indicating low quality (Fig. 2b).
The Ct values of the eight cancer-related genes in each WGA sample are shown in Table S2.
Performance of NGS in WGA sample. In the bulk sample, the total number of sequenced reads were 15.0 × 10 6 , depth of coverage was 46.4, and uniformity of coverage was 94.8%. In the QC passed WGA samples (n = 7) by method A (PCR-based), the number of total sequenced reads was 4.5 × 10 6 ± 1.2 × 10 6 (mean ± SD), depth of coverage was 8.3 ± 1.9 (mean ± SD) and uniformity of coverage was 40.1. ± 16.7% (mean ± SD). In  www.nature.com/scientificreports/ the QC passed WGA samples (n = 8) by method B (MDA-based), the numbers of total sequenced reads were 18 × 10 6 ± 5.0 × 10 6 (mean ± SD), depth of coverage was 45.7 ± 15.2 (mean ± SD) and uniformity of coverage was 32.8. ± 14.5% (mean ± SD). In the QC passed WGA samples (n = 6) by method C (MDA-based), the number of total sequenced reads was 21 × 10 6 ± 4.5 × 10 6 (mean ± SD), depth of coverage was 68.9 ± 18.8 (mean ± SD) and uniformity of coverage was 71.5 ± 24.6% (mean ± SD). Method A (PCR-based) yielded a substantially smaller number of sequenced reads compared to methods B and C (MDA-based), and methods B and C (MDA-based) yielded a comparable number of sequenced reads as the bulk sample. Consistent with the number of sequenced reads, methods B and C (MDA-based) yielded a comparable depth of coverage with the bulk sample, and method A (PCR-based) showed a substantially lower depth of coverage (Fig. 3a,b).
The uniformity of coverage in the bulk sample was over 90%; in contrast, the uniformity of coverage significantly differed among WGA samples (12.5-89.2%) despite passing QC. Method C (MDA-based) yielded higher uniformity of coverage than method A (PCR-based) and method B (MDA-based), and method C showed over 80% uniformity of coverage (Fig. 3c). The deviation in Ct values of the eight genes in qPCR correlated well with the uniformity of coverage in NGS (R = −0.66), and the low deviation of Ct values of the eight genes contributed to a high uniformity of coverage (Fig. 3d).
Performance of NGS in WTA sample. In the bulk samples (× 3 technical replicates), the number of sequenced reads from transcripts was 30.6 × 10 6 and number of expressed genes was 12,175. In WTA samples (n = 8) by method X (PCR-based), the number of sequenced reads from transcripts was 2.8 × 10 6 ± 0.4 × 10 6 (mean ± SD) and number of expressed genes was 5283 ± 392 (mean ± SD). In WTA samples (n = 8) by method Y (PCR-based), the number of sequenced reads from transcripts was 2.3 × 10 6 ± 0.2 × 10 6 (mean ± SD) and number of expressed genes was 3426 ± 513 (mean ± SD). In WTA samples (n = 8) by method Z (MDA-based), the number www.nature.com/scientificreports/ of sequenced reads from transcripts was 1.0 × 10 6 ± 0.5 × 10 6 (mean ± SD) and number of expressed genes was 3006 ± 1357 (mean ± SD). The number of sequenced reads from transcripts by method Z (MDA-based) was significantly lower than that of the other methods (PCR-based) (Fig. 4a). Most expressed genes in a single cell can be detected with 0.5 × 10 6 reads, and almost all samples yielded over 0.5 × 10 6 reads from the transcripts 13 . Consistent with the previous study, the number of expressed genes was comparable between each sample, and method X (PCR-based) amplified a larger number of expressed genes to some extent compared to the other methods (Fig. 4b).
We performed correlation analysis of the gene expression patterns between the bulk sample and WTA samples by each method and obtained correlation coefficients of 0.70 (method X, PCR-based), 0.47 (method Y, PCRbased), and 0.26 (method Z, MDA-based) (Fig. 4c). The heatmap of gene expression patterns showed that WTA samples by method X (PCR-based) closely resembled the bulk sample, whereas WTA samples by method Z (MDA-based) exhibited non-specific gene expression patterns (Fig. 4d). In WTA samples by method Z (MDAbased), genes showing high expression in the bulk sample were expressed at low levels; whereas those with low expression levels in the bulk sample were highly expressed in WTA samples.

Discussion
It is known that PCR-based and PCR with MDA-hybridized WGA methods generate high amplification uniformity, whereas the MDA-based WGA method shows high amplification efficiency but generates amplification bias through high processive amplification by phi29 DNA polymerase, contributing to lower amplification uniformity 5 . In this study, MDA-based WGA methods showed higher amplification efficiency than the PCRbased WGA method and produced a comparable depth of coverage with the bulk sample, but the uniformity of coverage varied widely among samples. As a QC criteria, we used Ct values ≤ 30 in all eight cancer-related    www.nature.com/scientificreports/ genes by multiplex qPCR according to a previous study 11 ; however, this criteria was not adequate for selecting samples with high uniformity of coverage. The deviation in Ct values of the eight genes correlated well with the uniformity of coverage, and a low deviation may be useful for QC evaluation of WGA samples. Interestingly, in the samples by the PCR-based WGA method, the Ct values showed very high deviation, but the uniformity of coverage was not as low. A low depth of coverage contributed to the reduced uniformity of coverage, but the high deviation in Ct values may have been caused by enzymatic digestion before adaptor ligation to DNA fragments in this workflow. Site-specific DNA digestion may result in fewer amplification products and DNA fragments that are too small or large 5 , and performing QC by multiplex qPCR may not be suitable for such a PCR-based WGA method. We also compared two different MDA-based WGA methods. Both high processive amplification by phi29 DNA polymerase and the priming inequality of random primers can cause amplification bias 9 ; therefore, we compared the combinations of phi29 DNA polymerase and random primers or DNA primase, TthPrimPol. Consistent with a previous study 14 , TthPrimPol showed low amplification uniformity, and the REPLI-g Advanced DNA Single Cell Kit (method C) using a combination of phi29 DNA polymerase and random primers showing higher uniformity. The MDA-based WGA method using a combination of high-fidelity phi29 DNA polymerase and random primers along with stricter QC criteria by multiplex qPCR may show high performance for comprehensive genetic profiling of CTCs at the single-cell level.
In contrast to the WGA method, the MDA-based WTA method showed lower amplification efficiency than the PCR-based WTA methods. A read count of 0.5 × 10 6 from transcripts is thought to cover most expressed genes in a single cell, and the number of expressed genes in WTA samples by each method was not as different as the number of sequenced reads. However, the MDA-based WTA method produced non-specific gene expression patterns, which possibly occur because of amplification bias to shorter or fragmented low-quality mRNA caused by phi29 polymerase and linear cDNA amplification. CEL-seq using the T7 promoter and linear amplification by in vitro transcription is similar to the MDA-based WTA method and accurately quantified mRNA levels using unique molecular identifiers (UMIs) in a previous study 15 . UMIs are stretches of 4-10 random nucleotides integrated into sequencing primers and serve as a random barcode for each mRNA molecule 16 . By counting each UMI only once rather than counting the total sequenced reads, amplification bias can be eliminated 16 . If we use the MDA-based WTA method, the integration of UMIs such as CEL-seq may improve the accuracy of the gene expression data by reducing amplification bias. In contrast, the SMART-Seq HT Kit (method X, PCRbased) showed higher amplification efficiency than the other methods, and the gene expression pattern was wellcorrelated with that in the bulk sample. The SMART-Seq HT Kit (method X) and method Y were built on the same SMART-seq (switching mechanism at the 5′ end of the RNA transcript) technique. The differences between these methods are that only the SMART-Seq HT kit uses LNA technology and the hands-on time is shorter than method Y and LNA technology may contribute to efficient and accurate cDNA synthesis and amplification.
In conclusion, we performed single-cell sequencing of CTCs using a combination of FACS and various WGA and WTA methods. The MDA-based WGA method using the combination of phi29 DNA polymerase and random primers showed high performance for single-cell DNA sequencing, whereas the PCR-based WTA method using the combination of template switching and LNA technology showed high performance for single-cell RNA sequencing. Although we present a more reliable and adaptable approach for CTC profiling at the single-cell level, further investigation of other metrics such as the accuracy of variant calling or tools of QC of WTA samples to perform single-cell sequencing of CTCs derived from clinical patients should be performed. Such molecular information on CTCs derived from clinical patients will promote cancer treatment and research.

Methods
Extraction of DNA and RNA from bulk of TGW cells. The human neuroblastoma cell line, TGW, was obtained from the American Type Culture Collection (Manassas VA, USA) and maintained as recommended previously 17 . DNA and RNA were extracted from 1 × 10 6 TGW cells using DNA Extractor WB Kit (FUJIFILM Wako Pure Chemical Corporation, Osaka, Japan) and miRNeasy Mini kit (Qiagen, Hilden, Germany) according to the manufacturer's protocols.
WGA . Ampli1 WGA Kit (Silicon Biosystems, Castel Maggiore, Italy), named as method A, is a PCR-based method using a linker adaptor with a universal sequence and single primer. TruePrime Single Cell WGA Kit (Sygnis, Heidelberg, Germany), named as method B, is an MDA-based method using phi29 DNA polymerase and DNA primase, TthPrimPol, rather than artificial primers. REPLI-g Advanced DNA Single Cell Kit (Qiagen), named as method C, is an MDA-based method using phi29 DNA polymerase and random primers. MALBAC Single Cell WGA Kit (Yikon Genomics, Jiangsu, China), named as method D, is a PCR with MDA hybridized method, and generates looped DNA molecules via eight cycles of multiple displacement preamplification using specifically designed MALBAC primers and Bst DNA polymerase. The looped amplicons are further amplified by PCR 10 . The WGA product concentration was measured using Qubit fluorometer 3.0 (Thermo Fisher Scien- Libraries for RNA sequencing of the bulk sample were prepared using TruSeq Stranded mRNA Library Prep Kit (Illumina). Library preparation for RNA sequencing of the WTA sample by method X was performed using Nextera XT DNA Library Preparation Kit (Illumina) according to the manufacturer's instructions. The libraries of the bulk sample and WTA samples by methods X and Z were equally pooled at a final loading concentration of 8.5 pM and paired-end 75 bp sequencing was performed by Hiseq 2500 (Illumina). The libraries of WTA samples by method Y were equally pooled at a final loading concentration of 12 pM and paired-end 75 bp sequencing was performed by Miseq (Illumina). Data analysis. FASTQ files of DNA sequencing were imported into CLC Genomics Workbench version 11.0.1 (CLC Bio, Aarhus, Denmark) and sequenced reads were aligned to the UCSC hg19 reference genome. The uniformity of coverage was calculated as the percentage of sequenced base positions in which the depth of coverage was greater than 0.2 × the mean depth of coverage 18 .
FASTQ files of RNA sequencing were processed using Strand NGS version 2.7 (Strand Life Sciences, Bangalore, India), and sequenced reads were aligned to the UCSC hg19 reference genome. We selected reads aligned to exonic regions and defined transcripts with over 100 sequenced reads as the expressed genes in each sample. Statistical analysis. Student t-test was used to compare each WGA, WTA method and statistical significance was defined as P < 0.05. The correlation coefficient between uniformity of coverage and deviation in the Ct values of the eight genes in QC was determined by Pearson's test. In RNA sequencing, sequenced reads from transcripts were normalized by DESeq, and correlation analysis of the gene expression patterns between the bulk sample and WTA samples was performed by Spearman's test. Other data were reported as the mean ± SD, and analyses were performed using Microsoft Excel software.

Data availability
The multiplex qPCR data analyzed in this study are included in this published article and supplementary materials. The sequencing data analyzed in this study are not publicly available but can be obtained from the corresponding author.