Massively multiplex single-cell Hi-C

Journal name:
Nature Methods
Volume:
14,
Pages:
263–266
Year published:
DOI:
doi:10.1038/nmeth.4155
Received
Accepted
Published online
Corrected online

We present single-cell combinatorial indexed Hi-C (sciHi-C), a method that applies combinatorial cellular indexing to chromosome conformation capture. In this proof of concept, we generate and sequence six sciHi-C libraries comprising a total of 10,696 single cells. We use sciHi-C data to separate cells by karyotypic and cell-cycle state differences and identify cell-to-cell heterogeneity in mammalian chromosomal conformation. Our results demonstrate that combinatorial indexing is a generalizable strategy for single-cell genomics.

At a glance

Figures

  1. sciHi-C integrates in situ Hi-C with combinatorial cellular indexing to generate signal-rich bulk Hi-C maps that can be decomposed into single-cell Hi-C maps.
    Figure 1: sciHi-C integrates in situ Hi-C with combinatorial cellular indexing to generate signal-rich bulk Hi-C maps that can be decomposed into single-cell Hi-C maps.

    (a) sciHi-C follows the traditional paradigm of fixation, digestion, and religation shared by all Hi-C assays (steps 1–4), but it uses a biotinylated bridge adaptor to incorporate a first round of barcodes in bulk before proximity ligation (step 3) and custom-barcoded Illumina Y-adaptors (step 5) to incorporate a second round of barcodes in diluted, redistributed, and lysed nuclei (one barcode per ~25 nuclei) before affinity purification and library amplification (steps 5 and 6). The vast majority of resulting molecules will harbor one unique pair of barcodes per single cell. RE, restriction enzyme. (b) Bulk data generated by this protocol can be decomposed to single-cell Hi-C maps. (c) sciHi-C libraries demonstrate a high cis:trans ratio, measured as the ratio of intrachromosomal contacts >20 kb apart to interchromosomal contacts. (d) The high cis:trans ratio observed in bulk data is maintained after libraries are all decomposed to ~1,800 cellular indices (each with ≥1,000 unique reads).

  2. Cellular indices generated through sciHi-C are overwhelmingly species specific and can be separated by cell type.
    Figure 2: Cellular indices generated through sciHi-C are overwhelmingly species specific and can be separated by cell type.

    (a) In libraries ML1 and ML2, similar levels of collision (defined as any cellular index with at least 1,000 unique reads but <95% species purity) are observed, and they fall within the expected range. (b) Species contamination visualized as a histogram of the fraction of reads mapping to the human genome (only cellular indices with ≥ 1,000 reads shown). (c) Projection onto the first two principal components from PCA analysis of interchromosomal contact matrices results in separation of HeLa S3 and HAP1, two karytoypically different cell lines (n = 3,609 cells). Percentages shown are the percentage of variance explained by each plotted component. (d) Principal component 2 loadings represent the contribution of each feature (interchromosomal contact) to the observed cell-type separation. Known translocations for each cell type are mirrored against the loading heatmap.

  3. sciHi-C of nocadazole-arrested HeLa S3 cells enable in silico sorting by cell-cycle progression.
    Figure 3: sciHi-C of nocadazole-arrested HeLa S3 cells enable in silico sorting by cell-cycle progression.

    (a) Mean contact probability and s.d. as a function of genomic distance for single HeLa S3 cells from a population treated with nocadazole (n = 588 cells containing at least 5,000 contacts and harboring nocadazole-experiment-specific programmed barcodes), as well as shuffled controls generated by random reassignment of cellular indices. (b) Scaling coefficients for 588 single HeLa S3 cells follow a bimodal distribution. (c) Cells can be 'sorted' in silico to generate two distinct contact probability maps, shown here for HeLa chromosome 12. The labels of both axes indicate chromosomal position.

  4. Nuclei remain intact through proximity ligation in the combinatorial single cell Hi-C protocol
    Supplementary Fig. 1: Nuclei remain intact through proximity ligation in the combinatorial single cell Hi-C protocol

    Phase contrast microscopy of HeLa S3 and HAP1 nuclei following proximity ligation and serial dilution shows that nuclei remain intact throughout the combinatorial single cell Hi-C protocol (scale bar = 100 μm).

  5. Coverage of combinatorial single cell Hi-C cellular indices follows a bimodal distribution.
    Supplementary Fig. 2: Coverage of combinatorial single cell Hi-C cellular indices follows a bimodal distribution.

    Examining a histogram of the coverage (i.e. # of unique reads) of combinatorial single cell Hi-C cellular indices in two replicate libraries reveals a bimodal distribution, where low coverage cellular indices likely represent barcoding of free DNA in solution, rather than intact nuclei.

  6. Coverage of cellular indices is not correlated between replicate experiments
    Supplementary Fig. 3: Coverage of cellular indices is not correlated between replicate experiments

    Scatter plot of coverage per cellular index for all cellular indices with at least 1 unique read in both replicate combinatorial single cell Hi-C libraries. A Pearson’s r of -0.03 suggests that there is minimal intrinsic bias (i.e. “barcode” effect) biasing coverage of particular cellular indices.

  7. Single cellular indices demonstrate high cis:trans ratios.
    Supplementary Fig. 4: Single cellular indices demonstrate high cis:trans ratios.

    Histogram of the cis:trans ratios for cellular indices over two biological replicates. High cis:trans ratio suggest that nuclei remain intact during the protocol, and hint at a single-cellular origin for the majority of cellular indices.

  8. Quality control statistics for PL1 and PL2 libraries are similar to primary experiment libraries.
    Supplementary Fig. 5: Quality control statistics for PL1 and PL2 libraries are similar to primary experiment libraries.

    a.) Violin plots showing the distribution of ligation types across all cellular indices with at least 1,000 reads in libraries PL1 and PL2. b.) Species specificity for both libraries.

  9. The HeLa genotype enables further filtration of potential barcode collisions in combinatorial single cell Hi-C datasets.
    Supplementary Fig. 6: The HeLa genotype enables further filtration of potential barcode collisions in combinatorial single cell Hi-C datasets.

    We examined all homozygous non-reference sites determined by Adey et al and tabulated the fraction of sites where the non-reference allele was found in our sequencing reads, with the expectation that single HeLa cells should have very high (i.e. >=99%) homozygous non-reference alleles at those sites, with reduced fractions indicating contamination by HAP1. For this study, we drew conservative cutoffs of 57% and 99% for each species (i.e. any cellular indices falling between these values were discarded).

  10. Raw single cell matrices used as input for PCA.
    Supplementary Fig. 7: Raw single cell matrices used as input for PCA.

    To generate these matrices, we took single-cell contact maps and vectorized them, such that each cell is represented by a vector of non-redundant contact counts between two loci. For interchromosomal analysis, each vector contained the log10 transformed number of raw counts between two chromosomes; for intrachromosomal analysis, each vector contained a 1 if a contact between two 10 Mb intrachromosomal windows was observed, and 0 if not. These vectors were then concatenated to form the heatmaps above. The pairwise bin ID simply represents a label for each pair of interacting windows represented in the heat maps. a.) A heat map representation of a portion (250 cells) of the input interchromosomal matrix for PCA. Rows represent single human cells, while columns represent pairwise interactions between two whole chromosomes. For this analysis, raw counts were used, and n = 3,609 cells. b.) Heat map representation of a portion (2,500 cells) of the input intrachromosomal matrix for PCA. Here, interchromosomal counts were ignored, and interaction frequencies between discrete 10 Mb windows genome-wide were reduced to a binary representation (i.e. 1 if present, 0 if absent). Again, n = 3,609 cells.

  11. The first component of PCA using both interchromosomal contacts and 10 Mb windowed intrachromosomal contacts strongly correlates with coverage.
    Supplementary Fig. 8: The first component of PCA using both interchromosomal contacts and 10 Mb windowed intrachromosomal contacts strongly correlates with coverage.

    a.) Correlation between the principal component 1 (PC1) and coverage for interchromosomal interactions (ρ = -0.917). b.) Correlation between the principal component 1 (PC1) and coverage for interacting 10 Mb intrachromosomal windows (ρ = 0.897).

  12. Analysis of principal component loadings for interchromosomal separation experiment reveals that translocations contribute to cell type separation in principal component space.
    Supplementary Fig. 9: Analysis of principal component loadings for interchromosomal separation experiment reveals that translocations contribute to cell type separation in principal component space.

    a.) Heat map of loadings for principal component 2 after all known translocations (blacked out entries) are removed from the analysis. b.) After removing all entries corresponding to known translocations from the interchromosomal single-cell Hi-C contact matrix, cell-type separation using PC1 and PC2 is qualitatively worse but still apparent, suggesting that cell-type specific interchromosomal contacts may contribute to the observed separation pattern. Percentages shown are the percentage of variance explained by each plotted PC.

  13. PCA using an alternate feature set still enables separation between HAP1 and K562.
    Supplementary Fig. 10: PCA using an alternate feature set still enables separation between HAP1 and K562.

    Shown is a projection of principal component 2 and principal component 3 from PCA on the intrachromosomal single cell contact matrix (n = 3,609 cells). For this analysis, only intrachromosomal contacts between 10 Mb windows were used. The matrix used for this computation is shown in Supplementary Figure 7b. Percentages shown are the percentage of variance explained by each plotted PC.

  14. Separation of cell types by PCA is consistent across biological replicate combinatorial single cell Hi-C experiments.
    Supplementary Fig. 11: Separation of cell types by PCA is consistent across biological replicate combinatorial single cell Hi-C experiments.

    Across 4 different libraries, the separation of single HeLa S3 and HAP1 cells is evident, suggesting that this is not simply a technical artifact or batch effect.

  15. PCA of single-cell interchromosomal contacts using cells from 4 different human cell types results in separation of HeLa S3 from other cell lines.
    Supplementary Fig. 12: PCA of single-cell interchromosomal contacts using cells from 4 different human cell types results in separation of HeLa S3 from other cell lines.

    A fifth experiment (Library ML3) containing K562 and GM12878 cells was lightly sequenced and combined with an existing HeLa S3 and HAP1 dataset (Library ML1), resulting in n = 1,394 cells. Projection of single cells onto PC2 and PC3 results in separation of HeLa S3 from the remaining three cell types, but weak separation of K562, GM12878, and HAP1. Percentages shown are the percentage of variance explained by each plotted PC.

  16. Combinatorial single cell Hi-C captures cell-to-cell heterogeneity masked by bulk measurement.
    Supplementary Fig. 13: Combinatorial single cell Hi-C captures cell-to-cell heterogeneity masked by bulk measurement.

    a.) Decay in contact probability for all primary experiment (ML libraries) cells with at least 10,000 unique contacts (n = 769 cells). Plotted is the mean contact probability for each bin (purple), along with standard deviation (blue). Shuffled controls where all cellular index assignments have been randomized demonstrate strikingly lower variance compared to observed single cells, for both mouse and human. b) Scaling coefficients calculated for a.), for distances between 50 kb and 8 Mb. Shuffled controls demonstrate a tighter distribution of coefficients compared to the observed single human cells. c.) Single-cell scaling coefficients reproducibly demonstrate positive correlation with single-cell cis:trans ratios in both mouse and human cells.

  17. Correlation between single cell cis:trans ratios and single-cell scaling coefficients is reproducible across combinatorial single-cell Hi-C experiments.
    Supplementary Fig. 14: Correlation between single cell cis:trans ratios and single-cell scaling coefficients is reproducible across combinatorial single-cell Hi-C experiments.

    We observe a correlation between high cis:trans ratios and shallow scaling coefficients in both mouse and human cells in both the PL2 (Pearson’s R = 0.199; Spearman’s ρ = 0.0713) and ML3 (Pearson’s R = 0.643; Spearman’s ρ = 0.175) experiments. It is possible that the lack of correlation / weaker correlation shown in PL1 (Pearson’s R = 0.0649; Spearman’s ρ = -0.0500) and PL2, respectively, are a result of shallower sequencing, or sampling (i.e. perhaps related to the relative abundance of unsynchronized cells in each phase of the cell cycle).

  18. [ldquo]Programmed[rdquo] barcoding approaches enable association of cell types with unique first round barcodes.
    Supplementary Fig. 15: “Programmed” barcoding approaches enable association of cell types with unique first round barcodes.

    By loading unique cell types into programmed wells during the first round of indexing, we are able to validate cell types in silico. This schematic shows how libraries PL1 and PL2 were generated, wherein only one cell type was present per cell. By contrast, for ML1, ML2 and ML3, subsets of wells contained mixtures of one human and one mouse cell type.

Accession codes

Primary accessions

Gene Expression Omnibus

Change history

Corrected online 10 February 2017
In the version of this article initially published online, the Gene Expression Omnibus (GEO) accession containing all processed data and raw reads (except for HeLa data) was not provided; the correct accession, GSE84920, has now been included. The error has been corrected for the print, PDF and HTML versions of this article as of 10 February 2017.

References

  1. Ramani, V., Shendure, J. & Duan, Z. Genomics Proteomics Bioinformatics 14, 720 (2016).
  2. Cremer, T. & Cremer, C. Nat. Rev. Genet. 2, 292301 (2001).
  3. van Steensel, B. & Dekker, J. Nat. Biotechnol. 28, 10891095 (2010).
  4. Söderberg, O. et al. Nat. Methods 3, 9951000 (2006).
  5. Ramani, V., Qiu, R. & Shendure, J. Nat. Biotechnol. 33, 980984 (2015).
  6. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Science 295, 13061311 (2002).
  7. Lieberman-Aiden, E. et al. Science 326, 289293 (2009).
  8. Duan, Z. et al. Nature 465, 363367 (2010).
  9. Nagano, T. et al. Nature 502, 5964 (2013).
  10. Cusanovich, D.A. et al. Science 348, 910914 (2015).
  11. Klein, A.M. et al. Cell 161, 11871201 (2015).
  12. Macosko, E.Z. et al. Cell 161, 12021214 (2015).
  13. Rotem, A. et al. Nat. Biotechnol. 33, 11651172 (2015).
  14. Rao, S.S.P. et al. Cell 159, 16651680 (2014).
  15. Deng, X. et al. Genome Biol. 16, 152 (2015).
  16. Adey, A. et al. Nature 500, 207211 (2013).
  17. Essletzbichler, P. et al. Genome Res. 24, 20592065 (2014).
  18. Naumova, N. et al. Science 342, 948953 (2013).
  19. Imakaev, M. et al. Nat. Methods 9, 9991003 (2012).
  20. Sanborn, A.L. et al. Proc. Natl. Acad. Sci. USA 112, E6456E6465 (2015).
  21. Vitak et al. Nat. Methods http://dx.doi/10.1038/NMETH.4154 (2017).
  22. Ramani, V., Duan, Z. & Shendure, J. Massively multiplex single-cell Hi-C. Protocol Exchange http://dx.doi.org/10.1038/protex.2017.005 (2017).
  23. Jin, W. et al. Nature 528, 142146 (2015).
  24. Servant, N. et al. Genome Biol. 16, 259 (2015).
  25. Carette, J.E. et al. Nature 477, 340343 (2011).

Download references

Author information

Affiliations

  1. Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

    • Vijay Ramani,
    • Ruolan Qiu,
    • William S Noble &
    • Jay Shendure
  2. Department of Pathology, University of Washington, Seattle, Washington, USA.

    • Xinxian Deng &
    • Christine M Disteche
  3. Illumina Inc., Advanced Research Group, San Diego, California, USA.

    • Kevin L Gunderson &
    • Frank J Steemers
  4. Department of Medicine, University of Washington, Seattle, Washington, USA.

    • Christine M Disteche
  5. Division of Hematology, University of Washington School of Medicine, Seattle, Washington, USA.

    • Zhijun Duan
  6. Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA.

    • Zhijun Duan
  7. Howard Hughes Medical Institute, Seattle, Washington, USA.

    • Jay Shendure

Contributions

V.R., Z.D., and J.S. conceived of the project. V.R., X.D., R.Q., and Z.D. carried out experiments. C.M.D. and W.S.N. provided invaluable critical input. K.L.G. and F.J.S. were part of initial discussions on novel approaches to single-cell Hi-C. V.R., Z.D., and J.S. wrote the paper.

Competing financial interests

K.L.G. and F.J.S. are employees of Illumina Inc.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Nuclei remain intact through proximity ligation in the combinatorial single cell Hi-C protocol (161 KB)

    Phase contrast microscopy of HeLa S3 and HAP1 nuclei following proximity ligation and serial dilution shows that nuclei remain intact throughout the combinatorial single cell Hi-C protocol (scale bar = 100 μm).

  2. Supplementary Figure 2: Coverage of combinatorial single cell Hi-C cellular indices follows a bimodal distribution. (23 KB)

    Examining a histogram of the coverage (i.e. # of unique reads) of combinatorial single cell Hi-C cellular indices in two replicate libraries reveals a bimodal distribution, where low coverage cellular indices likely represent barcoding of free DNA in solution, rather than intact nuclei.

  3. Supplementary Figure 3: Coverage of cellular indices is not correlated between replicate experiments (26 KB)

    Scatter plot of coverage per cellular index for all cellular indices with at least 1 unique read in both replicate combinatorial single cell Hi-C libraries. A Pearson’s r of -0.03 suggests that there is minimal intrinsic bias (i.e. “barcode” effect) biasing coverage of particular cellular indices.

  4. Supplementary Figure 4: Single cellular indices demonstrate high cis:trans ratios. (21 KB)

    Histogram of the cis:trans ratios for cellular indices over two biological replicates. High cis:trans ratio suggest that nuclei remain intact during the protocol, and hint at a single-cellular origin for the majority of cellular indices.

  5. Supplementary Figure 5: Quality control statistics for PL1 and PL2 libraries are similar to primary experiment libraries. (125 KB)

    a.) Violin plots showing the distribution of ligation types across all cellular indices with at least 1,000 reads in libraries PL1 and PL2. b.) Species specificity for both libraries.

  6. Supplementary Figure 6: The HeLa genotype enables further filtration of potential barcode collisions in combinatorial single cell Hi-C datasets. (28 KB)

    We examined all homozygous non-reference sites determined by Adey et al and tabulated the fraction of sites where the non-reference allele was found in our sequencing reads, with the expectation that single HeLa cells should have very high (i.e. >=99%) homozygous non-reference alleles at those sites, with reduced fractions indicating contamination by HAP1. For this study, we drew conservative cutoffs of 57% and 99% for each species (i.e. any cellular indices falling between these values were discarded).

  7. Supplementary Figure 7: Raw single cell matrices used as input for PCA. (1,177 KB)

    To generate these matrices, we took single-cell contact maps and vectorized them, such that each cell is represented by a vector of non-redundant contact counts between two loci. For interchromosomal analysis, each vector contained the log10 transformed number of raw counts between two chromosomes; for intrachromosomal analysis, each vector contained a 1 if a contact between two 10 Mb intrachromosomal windows was observed, and 0 if not. These vectors were then concatenated to form the heatmaps above. The pairwise bin ID simply represents a label for each pair of interacting windows represented in the heat maps. a.) A heat map representation of a portion (250 cells) of the input interchromosomal matrix for PCA. Rows represent single human cells, while columns represent pairwise interactions between two whole chromosomes. For this analysis, raw counts were used, and n = 3,609 cells. b.) Heat map representation of a portion (2,500 cells) of the input intrachromosomal matrix for PCA. Here, interchromosomal counts were ignored, and interaction frequencies between discrete 10 Mb windows genome-wide were reduced to a binary representation (i.e. 1 if present, 0 if absent). Again, n = 3,609 cells.

  8. Supplementary Figure 8: The first component of PCA using both interchromosomal contacts and 10 Mb windowed intrachromosomal contacts strongly correlates with coverage. (214 KB)

    a.) Correlation between the principal component 1 (PC1) and coverage for interchromosomal interactions (ρ = -0.917). b.) Correlation between the principal component 1 (PC1) and coverage for interacting 10 Mb intrachromosomal windows (ρ = 0.897).

  9. Supplementary Figure 9: Analysis of principal component loadings for interchromosomal separation experiment reveals that translocations contribute to cell type separation in principal component space. (334 KB)

    a.) Heat map of loadings for principal component 2 after all known translocations (blacked out entries) are removed from the analysis. b.) After removing all entries corresponding to known translocations from the interchromosomal single-cell Hi-C contact matrix, cell-type separation using PC1 and PC2 is qualitatively worse but still apparent, suggesting that cell-type specific interchromosomal contacts may contribute to the observed separation pattern. Percentages shown are the percentage of variance explained by each plotted PC.

  10. Supplementary Figure 10: PCA using an alternate feature set still enables separation between HAP1 and K562. (43 KB)

    Shown is a projection of principal component 2 and principal component 3 from PCA on the intrachromosomal single cell contact matrix (n = 3,609 cells). For this analysis, only intrachromosomal contacts between 10 Mb windows were used. The matrix used for this computation is shown in Supplementary Figure 7b. Percentages shown are the percentage of variance explained by each plotted PC.

  11. Supplementary Figure 11: Separation of cell types by PCA is consistent across biological replicate combinatorial single cell Hi-C experiments. (37 KB)

    Across 4 different libraries, the separation of single HeLa S3 and HAP1 cells is evident, suggesting that this is not simply a technical artifact or batch effect.

  12. Supplementary Figure 12: PCA of single-cell interchromosomal contacts using cells from 4 different human cell types results in separation of HeLa S3 from other cell lines. (47 KB)

    A fifth experiment (Library ML3) containing K562 and GM12878 cells was lightly sequenced and combined with an existing HeLa S3 and HAP1 dataset (Library ML1), resulting in n = 1,394 cells. Projection of single cells onto PC2 and PC3 results in separation of HeLa S3 from the remaining three cell types, but weak separation of K562, GM12878, and HAP1. Percentages shown are the percentage of variance explained by each plotted PC.

  13. Supplementary Figure 13: Combinatorial single cell Hi-C captures cell-to-cell heterogeneity masked by bulk measurement. (106 KB)

    a.) Decay in contact probability for all primary experiment (ML libraries) cells with at least 10,000 unique contacts (n = 769 cells). Plotted is the mean contact probability for each bin (purple), along with standard deviation (blue). Shuffled controls where all cellular index assignments have been randomized demonstrate strikingly lower variance compared to observed single cells, for both mouse and human. b) Scaling coefficients calculated for a.), for distances between 50 kb and 8 Mb. Shuffled controls demonstrate a tighter distribution of coefficients compared to the observed single human cells. c.) Single-cell scaling coefficients reproducibly demonstrate positive correlation with single-cell cis:trans ratios in both mouse and human cells.

  14. Supplementary Figure 14: Correlation between single cell cis:trans ratios and single-cell scaling coefficients is reproducible across combinatorial single-cell Hi-C experiments. (29 KB)

    We observe a correlation between high cis:trans ratios and shallow scaling coefficients in both mouse and human cells in both the PL2 (Pearson’s R = 0.199; Spearman’s ρ = 0.0713) and ML3 (Pearson’s R = 0.643; Spearman’s ρ = 0.175) experiments. It is possible that the lack of correlation / weaker correlation shown in PL1 (Pearson’s R = 0.0649; Spearman’s ρ = -0.0500) and PL2, respectively, are a result of shallower sequencing, or sampling (i.e. perhaps related to the relative abundance of unsynchronized cells in each phase of the cell cycle).

  15. Supplementary Figure 15: “Programmed” barcoding approaches enable association of cell types with unique first round barcodes. (82 KB)

    By loading unique cell types into programmed wells during the first round of indexing, we are able to validate cell types in silico. This schematic shows how libraries PL1 and PL2 were generated, wherein only one cell type was present per cell. By contrast, for ML1, ML2 and ML3, subsets of wells contained mixtures of one human and one mouse cell type.

PDF files

  1. Supplementary Text and Figures (3,636 KB)

    Supplementary Figure 1–15, Supplementary Table 1 and Supplementary Protocol

Excel files

  1. Supplementary Data (70 KB)

    sciHi-C barcode sequences

Additional data