Three-dimensional reconstruction of single-cell chromosome structure using recurrence plots

Hirata, Yoshito; Oda, Arisa; Ohta, Kunihiro; Aihara, Kazuyuki

doi:10.1038/srep34982

Download PDF

Article
Open access
Published: 11 October 2016

Three-dimensional reconstruction of single-cell chromosome structure using recurrence plots

Yoshito Hirata¹^na1,
Arisa Oda²^na1,
Kunihiro Ohta²^na1 &
…
Kazuyuki Aihara¹^na1

Scientific Reports volume 6, Article number: 34982 (2016) Cite this article

6462 Accesses
20 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Single-cell analysis of the three-dimensional (3D) chromosome structure can reveal cell-to-cell variability in genome activities. Here, we propose to apply recurrence plots, a mathematical method of nonlinear time series analysis, to reconstruct the 3D chromosome structure of a single cell based on information of chromosomal contacts from genome-wide chromosome conformation capture (Hi-C) data. This recurrence plot-based reconstruction (RPR) method enables rapid reconstruction of a unique structure in single cells, even from incomplete Hi-C information.

Learning the distribution of single-cell chromosome conformations in bacteria reveals emergent order across genomic scales

Article Open access 30 March 2021

Reconstruct high-resolution 3D genome structures for diverse cell-types using FLAMINGO

Article Open access 12 May 2022

Transcriptional activation during cell reprogramming correlates with the formation of 3D open chromatin hubs

Article Open access 22 May 2020

Introduction

Through the development of measurement techniques, we are getting the glimpse that chromosomal structures in individual cells may be differentially organized to exhibit distinct genome and chromatin activities. Such cell-to-cell variability might be an important aspect of cell physiology during differentiation or cellular responses. Genome-wide chromosome conformation capture (Hi-C) analysis^1,2,3 has been developed to estimate the three-dimensional (3D) chromosome structure, allowing greater accuracy of detection for local chromosomal contacts in a single cell^4,5. However, single-cell Hi-C is normally based on limited information of DNA contacts owing to the limited recovery of nuclear DNA from a single cell, while in regular Hi-C data, one can also employ read counts which enable more detailed estimation. Reconstruction of the 3D chromosome structure from such incomplete data is often difficult and requires long calculation times, even with high-performance computers.

In a mathematical sense, reconstruction of the 3D chromosome structure from Hi-C data can be considered a problem of geometry. Therefore, many ad hoc computational methods have been proposed^6,7 for reconstructing the 3D chromosome structure from Hi-C data. However, such methods have several disadvantages. First, there is no mathematical support that guarantees the correctness of the estimation. Second, such methods are often formulated as stochastic optimizations, and thus, their solutions are not unique. Finally, we sometimes need to use additional information other than the contact information for every pair of DNA segments.

Here, to overcome such difficulties, we propose a new mathematical method using recurrence plots^8,9 for reconstruction of the 3D chromosome structure. This method, referred to as recurrence plot-based reconstruction (RPR) method, enables the reconstruction of a unique 3D chromosome structure solely from contact map information obtained by Hi-C analysis. In addition, the RPR method is applicable, even to low-coverage chromosomal contact Hi-C data from a single cell. The algorithm can indeed reconstitute high-resolution whole chromosomal structures, even in the case of a single mammalian cell with a large genome.

Methods

A recurrence plot^8,9 is a tool of nonlinear time series analysis for visualizing temporal patterns within a time series. It is a two-dimensional plot, and both axes show the same time axis. In a recurrence plot, one can show whether or not the states corresponding to each pair of times are close to each other in state space (see Fig. 1). Mathematically, a recurrence plot R can be defined as follows:

where x_i is a state corresponding to time i, and ε is a predefined threshold. We plot a point at (i, j) when R(i, j) = 1 and nothing there if R(i, j) = 0 (see Fig. 1). Thus, the recurrence plot allows for display of the closeness between every pair of points, similar to Hi-C data. Therefore, if we regard chromosomal positions as time points in a time axis, we can naturally treat Hi-C data as a recurrence plot.

Several techniques^10,11,12 have been developed for the recovery of a rough shape of the original time series solely from the information of its recurrence plot. However, we have chosen to focus on the RPR method we previously proposed in 2008¹¹ because (i) it works, even if the original time series is multivariate, and (ii) we have proven a theorem¹³ that the metric space recovered using our prior method¹¹ is equivalent to the original Euclidean metric under mild conditions. In addition, the RPR method is known to be rather robust, even if we change the definition of the closeness¹⁴. Therefore, we used the method to reconstruct the 3D chromosome structure from Hi-C data.

The proposed procedure according to our previous method¹¹ can be roughly summarized as shown in Figs 2 and 3. First, we converted Hi-C data into a weighted graph (see Figs 2 and 3). In this graph, each node corresponds to a part of a chromosome, and we connect two nodes if the Hi-C data show that the corresponding two parts are located in close proximity (beneath the defined threshold). In addition, we assign the following local distance to the edge between the two nodes i and j:

where G_i = {k|R(k, i) = 1} is a set of indices plotted in the ith column of the recurrence plot and thus the corresponding Hi-C data (see Supplementary Texts A to understand intuitively that Eq. (2) gives us the original local distance between points i and j when they are close to each other). Second, we obtain the shortest distance by the Dijkstra method¹⁵ for every pair of nodes to obtain a distance matrix of global distances. Third, we used multidimensional scaling¹⁶ for visualizing the information of the distance matrix. By choosing the three largest eigenvalue components of the multidimensional scaling, we can reconstruct the 3D chromosome structure (see Figs 2 and 3). This RPR method is developed especially for low coverage data such as single-cell Hi-C analysis. The key point of this algorithm is that it treats the contact maps as binary without coverage information. Namely, if two corresponding segments of chromosomes are detected as neighbors at least once within a Hi-C dataset, we assign 1 to the corresponding two elements of the contact map, while if the two segments are not neighbors, then we assign 0 to the corresponding elements of the contact maps. Thus, this definition of the binary contact map is universal for both single Hi-C data and regular Hi-C data and does not need a threshold for the binarization. The threshold ε for the recurrence plot is defined intrinsically by the experimental limits of detectable contacts by the sparse Hi-C method as well as the regular Hi-C method. When the coverage depth is too high, the regular Hi-C data may be pre-processed to reduce the number of counts. But, presumably this pre-processing will not be necessary because the proposed RPR method can be used even in the case where as high as 70% of elements for the binary contact map are 1 (see ref. 11).

A necessary condition for our previous method¹¹ is that all nodes are connected with each other in the weighted graph, i.e., there is a path connecting every pair of nodes. To fulfill this necessary condition, we used a previously described method¹² to identify successive parts of chromosome as neighbors and declare that R(i, i + 1) = R(i + 1, i) = 1 when the parts corresponding to i and (i + 1) are on the same chromosome. Using this technique, we can ensure that the proposed reconstruction is possible in most cases. If, instead, we modify d(i, j) as a constant value when regions i and j have a contact or |i−j| ≤ 1, the RPR method becomes more similar to the method of Paulsen et al.¹⁷.

Thus, as similar to reconstructing the three dimensional structure of strange attractors for deterministic chaos as shown in Fig. 2, we reconstruct the three-dimensional structure of chromosomes by regarding a contact map of Hi-C data as a recurrence plot. Figure 3 shows the similarity between the two reconstruction processes.

Results

Single-cell Hi-C cannot report all the chromosomal interactions within a cell. In addition, sequencing data sometimes include errors. We therefore tested by toy models whether the RPR method is tolerant to the contamination of false positive noise and/or to the lack of information, as is often found in actual Hi-C data. Here we used the Lorenz model¹⁸ and Rössler model¹⁹ as examples of three-dimensional objects because they are in the three-dimensional space as similarly to chromosomes and thus the mathematical background for reconstructing the three-dimensional structures from their recurrence plots is similar to the three dimensional reconstruction of chromosomes from their contact map. Supplementary Figure 1 demonstrates the results using the Lorenz model¹⁸ and the Rössler model¹⁹, which are popular deterministic chaos models, with 1% bit flips and 90% random loss of information on the closeness. Correlations between the original data and the noisy data were kept as high as 0.70–0.92 in the test. Furthermore, datasets with only 0.2% of closest interaction information were used to mimic the single-cell Hi-C data. Then, we could visually recognize the topological similarities between the original shapes (Supplementary Figure 1a,f) and the reconstructions (Supplementary Figure 1e,j) using the limited information, suggesting that this method is sufficiently tolerant.

Finally, we show an example of estimation for X chromosomes in male mouse T_H1 cells from Hi-C data for different single cells and from an ensemble (regular) Hi-C data (Fig. 4)⁵. We first reconstructed distance information as distance matrices to reconstruct the 3D structure of each X chromosome at two different resolutions (500 and 250 kb; Fig. 4a,b). Between the two reconstructed distances at 500- and 250-kb resolutions, we obtained correlation coefficients of 0.9598, 0.9492, and 0.9481 for cells 1, 2, and 3, respectively. Therefore, our reconstructions seemed to be consistent and reasonable. In addition, at the 250-kb resolution, we obtained correlation coefficients of 0.7802, 0.7456, and 0.7405 between the reconstructed distributions of cell 1 and cell 2, those of cell 1 and cell 3, and those of cell 2 and cell 3, respectively. When we applied multidimensional scaling¹⁶ to the reconstructed distances, we obtained 3D chromosome structures (Fig. 4c) with a common feature, i.e., one of the X chromosome telomeres and an open loop were protruding from a cluster of other parts of the X chromosome. The topological structure which we found coincided well with that in Fig. 3A of Paulsen et al.¹⁷, where the same Hi-C data were used to reconstruct the three-dimensional structures.

We further applied the RPR method to cells 4–10 of ref. 5 in 250 kbp resolution (Supplementary Figure 2) as well as cells 1–10 of ref. 5 in 50 kbp resolution (see Fig. 5 and Supplementary Figure 3). Despite some common features such as an open loop at a position of 30 Mb in all cells, careful inspection reveals some cell-to-cell variability in the open loops seen at positions of 70 or 145 Mb. We further confirmed that application of the RPR method to the ensemble Hi-C data could reproduce the similar features (Figs 4 and 5, and Supplementary Figure 3).

There is some cell-to-cell variability among these cells. To demonstrate the cell-to-cell variability, we focused on the similarity and difference between the local structures in the scale of 1 Mb by taking “3D correlation coefficients” (see Supplementary Texts B for the details of the calculations). Figure 6a shows the correlation of local structures between every combinational pair of 10 individual cells, while Fig. 6b shows the correlation between the ensemble and each individual cell. The cell-to-cell variability is small when the reconstruction for each cell agreed with that of the ensemble (see Fig. 6a,b). This agreement means that from the analysis of the mass of cells, the 3D structure from ensemble Hi-C remarkably shows the common features that were observed within most of the ten individual cells of single cell Hi-C. On the other hand, structural fluctuation was observed at other parts of chromosomes where the topology for each cell is different from the ensemble. This structural fluctuation suggests the potential of single-cell Hi-C analysis to show the chromosome structure variability that cannot be seen in the ensemble Hi-C. Indeed, we can show how the topological structures are similar and different among individual cells in Fig. 6c.

The Th1 ensemble Hi-C data and single-cell Hi-C data were acquired from the GEO database (http://www.ncbi.nlm.nih.gov/geo/, accession GSE48262; GSM1173492 for the ensemble Hi-C data, and from GSM1173493 to GSM1173502 for the 10 single cell Hi-C data).

Discussions

The RPR method is qualitatively different from other ways^{2,7,20,21,22,23,24,25,26,27,28} of chromosomal structural reconstruction (except for that of Paulsen et al.¹⁷) in terms of usage of single-cell Hi-C data without any consideration on read counts²⁹. The method of Paulsen et al.¹⁷ is reported as an analysis appropriate for single-cell Hi-C data. While the method of Paulsen et al.¹⁷ only uses information of whether two segments of chromosomes are close or not for connecting an edge by just using a constant local distance, the RPR method can reflect information from distant segments when connecting two segments by an edge: it refines how close they are by reconstructing the local distance by Eq. (2), which is strongly related to the Jaccard coefficient³⁰ (see also Supplementary Texts A). Thus, the RPR method potentially enables us to reproduce the metric space and thus rationally estimate three-dimensional chromosome structures especially from very low coverage Hi-C data such as single-cell Hi-C. For overcoming the rough approximation at each local distance, the method of Paulsen et al.¹⁷ intentionally puts less weights on the distantly located pairs to reduce the influence of noise since a long path composed by many edges is less reliable. On the other hand, the RPR method does not have currently special mechanisms to reduce such noise since the RPR method is tolerant to noise and lack of information as shown in Supplementary Figure 1: it is because some small fractions of bit flips or missing bits due to such noise cause small differences in the evaluations of Eq. (2) and thus the topological structures reconstructed. Hence, if we combine the noise reducing mechanism of Paulsen et al.¹⁷ for long paths with the weighted graph of the RPR method, one may be able to reproduce the 3D structure of chromosomes more accurately. This direction is an open remaining problem.

Another important advantage of the RPM method is that it provides a unique three-dimensional reconstruction. This is because (i) the proposed method coverts a contact map to a distance matrix in a deterministic way, and (ii) the classical multidimensional scaling implemented in Matlab gives a unique configuration for a spatial arrangement of points given their distance matrix.

In addition, the RPR method has other advantages in comparison with the conventional methods^{2,7,17,20,21,22,23,24,25,26,27,28} such as methods by Lense et al.⁷ and Paulsen et al.¹⁷. First, the RPR method only needs the information of the contact map, whereas the method by Lense et al.⁷ requires additional information such as the frequencies of pairs or read counts, hence the RPR method is suitable for Hi-C data with a few read counts. Second, the proposed method is based on mathematical proofs^13,14 and seems to work well with larger datasets. For example, the RPR method functioned well, even for a large-scale dataset with over 10,000 points³, whereas the example with Lense’s method handled only up to 1,000 points⁷. Third, the proposed method could be carried out within a reasonable time on an easily available computer. For example, it takes less than an hour to reconstruct the 3D chromosome structure for a whole set of chromosomes of a single cell at 250-kb resolution if we replace the Dijkstra method with the Johnson method³¹, even when we use a conventional computer with 2 × 2.66 GHz 6-Core Intel Xeon and 64 GB memory with codes implemented on Matlab (Supplementary Codes). To make this comparison reasonable and fair, we used the codes provided by the link of ref. 17, and reproduced the structure of X chromosome in 50 kbp resolution using the same above computer. Then we found that the method of ref. 17 needed 764 seconds, while the proposed RPR method needed 157 seconds and reproduced the typical topology appropriately (Here we replaced the Dijkstra method with the Johnson³¹ method implemented in Matlab’s graphallshortestpaths function. This function for finding the shortest path lengths for all pairs of nodes is originally used in the codes of ref. 17 and thus we decided to employ it for speeding up; See Supplementary Table 1 and Supplementary Figure 4 for further comparisons). Thus, the proposed method is expected to provide a breakthrough for the reconstruction of the 3D chromosomal structure from Hi-C data.

Additional Information

How to cite this article: Hirata, Y. et al. Three-dimensional reconstruction of single-cell chromosome structure using recurrence plots. Sci. Rep. 6, 34982; doi: 10.1038/srep34982 (2016).

References

Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article ADS CAS Google Scholar
Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).
Article ADS CAS Google Scholar
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS Google Scholar
Nagano, T. et al. Single cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013).
Article ADS CAS Google Scholar
Nagano, T. et al. Single-cell Hi-C for genome-wide detection of chromatin interactions that occur simultaneously in a single cell. Nat. Protoc. 10, 1986–2003 (2015).
Article CAS Google Scholar
Marti-Renom, M. A. & Mirny, L. A. Bridging the resolution gap in structural modeling of 3D genome organization. PLoS Comp. Biol. 7, e1002125 (2011).
Article ADS CAS Google Scholar
Lesne, A., Riposo, J., Roger, P., Cournac, A. & Mozziconacci, J. 3D genome reconstruction from chromosomal contacts. Nat. Methods 11, 1141–1143 (2014).
Article CAS Google Scholar
Eckmann, J. P., Kamphorst, S. O. & Ruelle, D. Recurrence plots of dynamical systems. Europhys. Lett. 4, 973–977 (1987).
Article ADS Google Scholar
Marwan, N., Romano, M. C., Thiel, M. & Kurths, J. Recurrence plots for the analysis of complex systems. Phys. Rep. 438, 237–329 (2007).
Article ADS MathSciNet Google Scholar
Thiel, M., Romano, M. C. & Kurths, J. How much information is contained in a recurrence plot? Phys. Lett. A 330, 343–349 (2004).
Article ADS MathSciNet CAS Google Scholar
Hirata, Y., Horai, S. & Aihara, K. Reproduction of distance matrices and original time series from recurrence plots and their applications. Eur. Phys. J. Special Topics 164, 13–22 (2008).
Article ADS Google Scholar
Tanio, M., Hirata, Y. & Suzuki, H. Reconstruction of driving forces through recurrence plots. Phys. Lett. A 373, 2031–2040 (2009).
Article ADS MathSciNet CAS Google Scholar
Hirata, Y., Komuro, M., Horai, S. & Aihara, K. Faithfulness of recurrence plots: a mathematical proof. Int. J. Bifurcation Chaos 25, 1550168 (2015).
Article ADS MathSciNet Google Scholar
Khor, A. & Small, M. Examining k-nearest neighbour networks: Superfamily phenomena and inversion. Chaos 26, 043101 (2016).
Article ADS MathSciNet Google Scholar
Dijikstra, E. W. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959).
Article MathSciNet Google Scholar
Gower, J. C. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966).
Article MathSciNet Google Scholar
Paulsen, J., Gramstad, O. & Collas, P. Manifold based optimization for single-cell 3D genome reconstruction. PLoS Comp. Biol. 11, e1004396 (2015).
Article ADS Google Scholar
Lorenz, E. N. Deterministic nonperiodic flow. J. Atmos. Sci. 20, 130–141 (1963).
Article ADS Google Scholar
Rössler, O. E. An equation for continuous chaos. Phys. Lett. A 57, 397–398 (1976).
Article ADS Google Scholar
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures reavealed by tethered chromosome conformation captures and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).
Article Google Scholar
Rousseau, M., Fraser, J., Ferraiuolo, M. A., Dostie, J. & Blanchette, M. Thre-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling. BMC Bioinformatics 12, 414 (2011).
Article Google Scholar
Baù, D. & Marti-Renom, M. A. Genome structure determination via 3C-based data integration by the integrative modeling platform. Methods 58, 300–306 (2012).
Article Google Scholar
Meluzzi, D. & Arya, G. Recovering ensembles of chromatin conformations from contact probabilities. Nucleic Acids Res. doi:10.1093/nar/gks1029 (2012).
Zhang, Z., Li, G., Toh, K. C. & Sung, W. K. 3D chrommosome modeling with semi-definite programming and Hi-C data. J. Comput. Biol. 20, 831–846 (2013).
Article MathSciNet CAS Google Scholar
Peng, C. et al. The sequencing bias relaxed characteristics of Hi-C derived data and implications for chromatin 3D modeling. Nucleic Acids Res. 41, e183 (2013).
Article CAS Google Scholar
Hu, M. et al. Bayesian inference of spatial organization of chromosomes. PLoS Comput. Biol. 9, e1002893 (2013).
Article CAS Google Scholar
Varoquaux, N., Ay, F., Noble, W. S. & Vert, J. P. A statistical approach for inferring the 3D structure of the genome. Bioinformatics 30, i26–i33 (2014).
Article CAS Google Scholar
Giorgetti, L. et al. Predictive polymer modeling reveals coupled fluctuations in chromosome conformation and transcription. Cell 157, 950–963 (2014).
Article CAS Google Scholar
Serra, F. et al. Restraint-based three-dimensional modeling of genomes and genomic domains. FEBS Lett. 589, 2987–2995 (2015).
Article CAS Google Scholar
Levandowsky, M. & Winter, D. Distance between sets. Nature 234, 34–35 (1971).
Article ADS Google Scholar
Johnson, D. B. Efficient algorithms for shortest paths in sparse networks. Journal of the ACM 24, 1–13 (1977).
Article MathSciNet Google Scholar

Download references

Acknowledgements

We appreciate Dr. Jonas Paulsen for making their codes of ref. 17 available online. In addition, we thank Mr. Keita Oda for helping us to fill in the gaps between the applied mathematicians and molecular biologists. Moreover, we are grateful to Prof. Kazunori Yamaguchi for discussing how to speed up our algorithm. This research was supported by the Platform Project for Supporting Drug Discovery and Life Science Research (Platform for Dynamic Approaches to Living Systems) from the Ministry of Education, Culture, Sports, and Technology (MEXT) and the Japan Agency for Medical Research and Development (AMED). The research of K.A. was also supported by JSPS KAKENHI Grant number 15H05707.

Author information

Hirata Yoshito and Oda Arisa contributed equally to this work.

Authors and Affiliations

Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Yoshito Hirata & Kazuyuki Aihara
Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, 153-8902, Tokyo, Japan
Arisa Oda & Kunihiro Ohta

Authors

Yoshito Hirata
View author publications
You can also search for this author in PubMed Google Scholar
Arisa Oda
View author publications
You can also search for this author in PubMed Google Scholar
Kunihiro Ohta
View author publications
You can also search for this author in PubMed Google Scholar
Kazuyuki Aihara
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.H., A.O., K.O. and K.A. designed the research. Y.H. implemented the algorithm and analyzed experimental data. A.O. produced most of the figures. Y.H., A.O., K.O. and K.A. wrote the paper.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Hirata, Y., Oda, A., Ohta, K. et al. Three-dimensional reconstruction of single-cell chromosome structure using recurrence plots. Sci Rep 6, 34982 (2016). https://doi.org/10.1038/srep34982

Download citation

Received: 19 May 2016
Accepted: 21 September 2016
Published: 11 October 2016
DOI: https://doi.org/10.1038/srep34982

This article is cited by

Imputation-free reconstructions of three-dimensional chromosome architectures in human diploid single-cells using allele-specified contacts
- Yoshito Hirata
- Arisa H. Oda
- Kunihiro Ohta
Scientific Reports (2022)
Reconstruct high-resolution 3D genome structures for diverse cell-types using FLAMINGO
- Hao Wang
- Jiaxin Yang
- Jianrong Wang
Nature Communications (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.