We present single-cell combinatorial indexed Hi-C (sciHi-C), a method that applies combinatorial cellular indexing to chromosome conformation capture. In this proof of concept, we generate and sequence six sciHi-C libraries comprising a total of 10,696 single cells. We use sciHi-C data to separate cells by karyotypic and cell-cycle state differences and identify cell-to-cell heterogeneity in mammalian chromosomal conformation. Our results demonstrate that combinatorial indexing is a generalizable strategy for single-cell genomics.
At a glance
Gene Expression Omnibus
- Genomics Proteomics Bioinformatics 14, 7–20 (2016). , &
- Nat. Rev. Genet. 2, 292–301 (2001). &
- Nat. Biotechnol. 28, 1089–1095 (2010). &
- Nat. Methods 3, 995–1000 (2006). et al.
- Nat. Biotechnol. 33, 980–984 (2015). , &
- Science 295, 1306–1311 (2002). , , &
- Science 326, 289–293 (2009). et al.
- Nature 465, 363–367 (2010). et al.
- Nature 502, 59–64 (2013). et al.
- Science 348, 910–914 (2015). et al.
- Cell 161, 1187–1201 (2015). et al.
- Cell 161, 1202–1214 (2015). et al.
- Nat. Biotechnol. 33, 1165–1172 (2015). et al.
- Cell 159, 1665–1680 (2014). et al.
- Genome Biol. 16, 152 (2015). et al.
- Nature 500, 207–211 (2013). et al.
- Genome Res. 24, 2059–2065 (2014). et al.
- Science 342, 948–953 (2013). et al.
- Nat. Methods 9, 999–1003 (2012). et al.
- Proc. Natl. Acad. Sci. USA 112, E6456–E6465 (2015). et al.
- Nat. Methods http://dx.doi/10.1038/NMETH.4154 (2017). et al.
- Massively multiplex single-cell Hi-C. Protocol Exchange http://dx.doi.org/10.1038/protex.2017.005 (2017). , &
- Nature 528, 142–146 (2015). et al.
- Genome Biol. 16, 259 (2015). et al.
- Nature 477, 340–343 (2011). et al.
- Supplementary Figure 1: Nuclei remain intact through proximity ligation in the combinatorial single cell Hi-C protocol (161 KB)
Phase contrast microscopy of HeLa S3 and HAP1 nuclei following proximity ligation and serial dilution shows that nuclei remain intact throughout the combinatorial single cell Hi-C protocol (scale bar = 100 μm).
- Supplementary Figure 2: Coverage of combinatorial single cell Hi-C cellular indices follows a bimodal distribution. (23 KB)
Examining a histogram of the coverage (i.e. # of unique reads) of combinatorial single cell Hi-C cellular indices in two replicate libraries reveals a bimodal distribution, where low coverage cellular indices likely represent barcoding of free DNA in solution, rather than intact nuclei.
- Supplementary Figure 3: Coverage of cellular indices is not correlated between replicate experiments (26 KB)
Scatter plot of coverage per cellular index for all cellular indices with at least 1 unique read in both replicate combinatorial single cell Hi-C libraries. A Pearson’s r of -0.03 suggests that there is minimal intrinsic bias (i.e. “barcode” effect) biasing coverage of particular cellular indices.
- Supplementary Figure 4: Single cellular indices demonstrate high cis:trans ratios. (21 KB)
Histogram of the cis:trans ratios for cellular indices over two biological replicates. High cis:trans ratio suggest that nuclei remain intact during the protocol, and hint at a single-cellular origin for the majority of cellular indices.
- Supplementary Figure 5: Quality control statistics for PL1 and PL2 libraries are similar to primary experiment libraries. (125 KB)
a.) Violin plots showing the distribution of ligation types across all cellular indices with at least 1,000 reads in libraries PL1 and PL2. b.) Species specificity for both libraries.
- Supplementary Figure 6: The HeLa genotype enables further filtration of potential barcode collisions in combinatorial single cell Hi-C datasets. (28 KB)
We examined all homozygous non-reference sites determined by Adey et al and tabulated the fraction of sites where the non-reference allele was found in our sequencing reads, with the expectation that single HeLa cells should have very high (i.e. >=99%) homozygous non-reference alleles at those sites, with reduced fractions indicating contamination by HAP1. For this study, we drew conservative cutoffs of 57% and 99% for each species (i.e. any cellular indices falling between these values were discarded).
- Supplementary Figure 7: Raw single cell matrices used as input for PCA. (1,177 KB)
To generate these matrices, we took single-cell contact maps and vectorized them, such that each cell is represented by a vector of non-redundant contact counts between two loci. For interchromosomal analysis, each vector contained the log10 transformed number of raw counts between two chromosomes; for intrachromosomal analysis, each vector contained a 1 if a contact between two 10 Mb intrachromosomal windows was observed, and 0 if not. These vectors were then concatenated to form the heatmaps above. The pairwise bin ID simply represents a label for each pair of interacting windows represented in the heat maps. a.) A heat map representation of a portion (250 cells) of the input interchromosomal matrix for PCA. Rows represent single human cells, while columns represent pairwise interactions between two whole chromosomes. For this analysis, raw counts were used, and n = 3,609 cells. b.) Heat map representation of a portion (2,500 cells) of the input intrachromosomal matrix for PCA. Here, interchromosomal counts were ignored, and interaction frequencies between discrete 10 Mb windows genome-wide were reduced to a binary representation (i.e. 1 if present, 0 if absent). Again, n = 3,609 cells.
- Supplementary Figure 8: The first component of PCA using both interchromosomal contacts and 10 Mb windowed intrachromosomal contacts strongly correlates with coverage. (214 KB)
a.) Correlation between the principal component 1 (PC1) and coverage for interchromosomal interactions (ρ = -0.917). b.) Correlation between the principal component 1 (PC1) and coverage for interacting 10 Mb intrachromosomal windows (ρ = 0.897).
- Supplementary Figure 9: Analysis of principal component loadings for interchromosomal separation experiment reveals that translocations contribute to cell type separation in principal component space. (334 KB)
a.) Heat map of loadings for principal component 2 after all known translocations (blacked out entries) are removed from the analysis. b.) After removing all entries corresponding to known translocations from the interchromosomal single-cell Hi-C contact matrix, cell-type separation using PC1 and PC2 is qualitatively worse but still apparent, suggesting that cell-type specific interchromosomal contacts may contribute to the observed separation pattern. Percentages shown are the percentage of variance explained by each plotted PC.
- Supplementary Figure 10: PCA using an alternate feature set still enables separation between HAP1 and K562. (43 KB)
Shown is a projection of principal component 2 and principal component 3 from PCA on the intrachromosomal single cell contact matrix (n = 3,609 cells). For this analysis, only intrachromosomal contacts between 10 Mb windows were used. The matrix used for this computation is shown in Supplementary Figure 7b. Percentages shown are the percentage of variance explained by each plotted PC.
- Supplementary Figure 11: Separation of cell types by PCA is consistent across biological replicate combinatorial single cell Hi-C experiments. (37 KB)
Across 4 different libraries, the separation of single HeLa S3 and HAP1 cells is evident, suggesting that this is not simply a technical artifact or batch effect.
- Supplementary Figure 12: PCA of single-cell interchromosomal contacts using cells from 4 different human cell types results in separation of HeLa S3 from other cell lines. (47 KB)
A fifth experiment (Library ML3) containing K562 and GM12878 cells was lightly sequenced and combined with an existing HeLa S3 and HAP1 dataset (Library ML1), resulting in n = 1,394 cells. Projection of single cells onto PC2 and PC3 results in separation of HeLa S3 from the remaining three cell types, but weak separation of K562, GM12878, and HAP1. Percentages shown are the percentage of variance explained by each plotted PC.
- Supplementary Figure 13: Combinatorial single cell Hi-C captures cell-to-cell heterogeneity masked by bulk measurement. (106 KB)
a.) Decay in contact probability for all primary experiment (ML libraries) cells with at least 10,000 unique contacts (n = 769 cells). Plotted is the mean contact probability for each bin (purple), along with standard deviation (blue). Shuffled controls where all cellular index assignments have been randomized demonstrate strikingly lower variance compared to observed single cells, for both mouse and human. b) Scaling coefficients calculated for a.), for distances between 50 kb and 8 Mb. Shuffled controls demonstrate a tighter distribution of coefficients compared to the observed single human cells. c.) Single-cell scaling coefficients reproducibly demonstrate positive correlation with single-cell cis:trans ratios in both mouse and human cells.
- Supplementary Figure 14: Correlation between single cell cis:trans ratios and single-cell scaling coefficients is reproducible across combinatorial single-cell Hi-C experiments. (29 KB)
We observe a correlation between high cis:trans ratios and shallow scaling coefficients in both mouse and human cells in both the PL2 (Pearson’s R = 0.199; Spearman’s ρ = 0.0713) and ML3 (Pearson’s R = 0.643; Spearman’s ρ = 0.175) experiments. It is possible that the lack of correlation / weaker correlation shown in PL1 (Pearson’s R = 0.0649; Spearman’s ρ = -0.0500) and PL2, respectively, are a result of shallower sequencing, or sampling (i.e. perhaps related to the relative abundance of unsynchronized cells in each phase of the cell cycle).
- Supplementary Figure 15: “Programmed” barcoding approaches enable association of cell types with unique first round barcodes. (82 KB)
By loading unique cell types into programmed wells during the first round of indexing, we are able to validate cell types in silico. This schematic shows how libraries PL1 and PL2 were generated, wherein only one cell type was present per cell. By contrast, for ML1, ML2 and ML3, subsets of wells contained mixtures of one human and one mouse cell type.
- Supplementary Text and Figures (3,636 KB)
Supplementary Figure 1–15, Supplementary Table 1 and Supplementary Protocol
- Supplementary Data (70 KB)
sciHi-C barcode sequences