Abstract
Singlecell analysis of the threedimensional (3D) chromosome structure can reveal celltocell variability in genome activities. Here, we propose to apply recurrence plots, a mathematical method of nonlinear time series analysis, to reconstruct the 3D chromosome structure of a single cell based on information of chromosomal contacts from genomewide chromosome conformation capture (HiC) data. This recurrence plotbased reconstruction (RPR) method enables rapid reconstruction of a unique structure in single cells, even from incomplete HiC information.
Introduction
Through the development of measurement techniques, we are getting the glimpse that chromosomal structures in individual cells may be differentially organized to exhibit distinct genome and chromatin activities. Such celltocell variability might be an important aspect of cell physiology during differentiation or cellular responses. Genomewide chromosome conformation capture (HiC) analysis^{1,2,3} has been developed to estimate the threedimensional (3D) chromosome structure, allowing greater accuracy of detection for local chromosomal contacts in a single cell^{4,5}. However, singlecell HiC is normally based on limited information of DNA contacts owing to the limited recovery of nuclear DNA from a single cell, while in regular HiC data, one can also employ read counts which enable more detailed estimation. Reconstruction of the 3D chromosome structure from such incomplete data is often difficult and requires long calculation times, even with highperformance computers.
In a mathematical sense, reconstruction of the 3D chromosome structure from HiC data can be considered a problem of geometry. Therefore, many ad hoc computational methods have been proposed^{6,7} for reconstructing the 3D chromosome structure from HiC data. However, such methods have several disadvantages. First, there is no mathematical support that guarantees the correctness of the estimation. Second, such methods are often formulated as stochastic optimizations, and thus, their solutions are not unique. Finally, we sometimes need to use additional information other than the contact information for every pair of DNA segments.
Here, to overcome such difficulties, we propose a new mathematical method using recurrence plots^{8,9} for reconstruction of the 3D chromosome structure. This method, referred to as recurrence plotbased reconstruction (RPR) method, enables the reconstruction of a unique 3D chromosome structure solely from contact map information obtained by HiC analysis. In addition, the RPR method is applicable, even to lowcoverage chromosomal contact HiC data from a single cell. The algorithm can indeed reconstitute highresolution whole chromosomal structures, even in the case of a single mammalian cell with a large genome.
Methods
A recurrence plot^{8,9} is a tool of nonlinear time series analysis for visualizing temporal patterns within a time series. It is a twodimensional plot, and both axes show the same time axis. In a recurrence plot, one can show whether or not the states corresponding to each pair of times are close to each other in state space (see Fig. 1). Mathematically, a recurrence plot R can be defined as follows:
where x_{i} is a state corresponding to time i, and ε is a predefined threshold. We plot a point at (i, j) when R(i, j) = 1 and nothing there if R(i, j) = 0 (see Fig. 1). Thus, the recurrence plot allows for display of the closeness between every pair of points, similar to HiC data. Therefore, if we regard chromosomal positions as time points in a time axis, we can naturally treat HiC data as a recurrence plot.
Several techniques^{10,11,12} have been developed for the recovery of a rough shape of the original time series solely from the information of its recurrence plot. However, we have chosen to focus on the RPR method we previously proposed in 2008^{11} because (i) it works, even if the original time series is multivariate, and (ii) we have proven a theorem^{13} that the metric space recovered using our prior method^{11} is equivalent to the original Euclidean metric under mild conditions. In addition, the RPR method is known to be rather robust, even if we change the definition of the closeness^{14}. Therefore, we used the method to reconstruct the 3D chromosome structure from HiC data.
The proposed procedure according to our previous method^{11} can be roughly summarized as shown in Figs 2 and 3. First, we converted HiC data into a weighted graph (see Figs 2 and 3). In this graph, each node corresponds to a part of a chromosome, and we connect two nodes if the HiC data show that the corresponding two parts are located in close proximity (beneath the defined threshold). In addition, we assign the following local distance to the edge between the two nodes i and j:
where G_{i} = {kR(k, i) = 1} is a set of indices plotted in the ith column of the recurrence plot and thus the corresponding HiC data (see Supplementary Texts A to understand intuitively that Eq. (2) gives us the original local distance between points i and j when they are close to each other). Second, we obtain the shortest distance by the Dijkstra method^{15} for every pair of nodes to obtain a distance matrix of global distances. Third, we used multidimensional scaling^{16} for visualizing the information of the distance matrix. By choosing the three largest eigenvalue components of the multidimensional scaling, we can reconstruct the 3D chromosome structure (see Figs 2 and 3). This RPR method is developed especially for low coverage data such as singlecell HiC analysis. The key point of this algorithm is that it treats the contact maps as binary without coverage information. Namely, if two corresponding segments of chromosomes are detected as neighbors at least once within a HiC dataset, we assign 1 to the corresponding two elements of the contact map, while if the two segments are not neighbors, then we assign 0 to the corresponding elements of the contact maps. Thus, this definition of the binary contact map is universal for both single HiC data and regular HiC data and does not need a threshold for the binarization. The threshold ε for the recurrence plot is defined intrinsically by the experimental limits of detectable contacts by the sparse HiC method as well as the regular HiC method. When the coverage depth is too high, the regular HiC data may be preprocessed to reduce the number of counts. But, presumably this preprocessing will not be necessary because the proposed RPR method can be used even in the case where as high as 70% of elements for the binary contact map are 1 (see ref. 11).
A necessary condition for our previous method^{11} is that all nodes are connected with each other in the weighted graph, i.e., there is a path connecting every pair of nodes. To fulfill this necessary condition, we used a previously described method^{12} to identify successive parts of chromosome as neighbors and declare that R(i, i + 1) = R(i + 1, i) = 1 when the parts corresponding to i and (i + 1) are on the same chromosome. Using this technique, we can ensure that the proposed reconstruction is possible in most cases. If, instead, we modify d(i, j) as a constant value when regions i and j have a contact or i−j ≤ 1, the RPR method becomes more similar to the method of Paulsen et al.^{17}.
Thus, as similar to reconstructing the three dimensional structure of strange attractors for deterministic chaos as shown in Fig. 2, we reconstruct the threedimensional structure of chromosomes by regarding a contact map of HiC data as a recurrence plot. Figure 3 shows the similarity between the two reconstruction processes.
Results
Singlecell HiC cannot report all the chromosomal interactions within a cell. In addition, sequencing data sometimes include errors. We therefore tested by toy models whether the RPR method is tolerant to the contamination of false positive noise and/or to the lack of information, as is often found in actual HiC data. Here we used the Lorenz model^{18} and Rössler model^{19} as examples of threedimensional objects because they are in the threedimensional space as similarly to chromosomes and thus the mathematical background for reconstructing the threedimensional structures from their recurrence plots is similar to the three dimensional reconstruction of chromosomes from their contact map. Supplementary Figure 1 demonstrates the results using the Lorenz model^{18} and the Rössler model^{19}, which are popular deterministic chaos models, with 1% bit flips and 90% random loss of information on the closeness. Correlations between the original data and the noisy data were kept as high as 0.70–0.92 in the test. Furthermore, datasets with only 0.2% of closest interaction information were used to mimic the singlecell HiC data. Then, we could visually recognize the topological similarities between the original shapes (Supplementary Figure 1a,f) and the reconstructions (Supplementary Figure 1e,j) using the limited information, suggesting that this method is sufficiently tolerant.
Finally, we show an example of estimation for X chromosomes in male mouse T_{H}1 cells from HiC data for different single cells and from an ensemble (regular) HiC data (Fig. 4)^{5}. We first reconstructed distance information as distance matrices to reconstruct the 3D structure of each X chromosome at two different resolutions (500 and 250 kb; Fig. 4a,b). Between the two reconstructed distances at 500 and 250kb resolutions, we obtained correlation coefficients of 0.9598, 0.9492, and 0.9481 for cells 1, 2, and 3, respectively. Therefore, our reconstructions seemed to be consistent and reasonable. In addition, at the 250kb resolution, we obtained correlation coefficients of 0.7802, 0.7456, and 0.7405 between the reconstructed distributions of cell 1 and cell 2, those of cell 1 and cell 3, and those of cell 2 and cell 3, respectively. When we applied multidimensional scaling^{16} to the reconstructed distances, we obtained 3D chromosome structures (Fig. 4c) with a common feature, i.e., one of the X chromosome telomeres and an open loop were protruding from a cluster of other parts of the X chromosome. The topological structure which we found coincided well with that in Fig. 3A of Paulsen et al.^{17}, where the same HiC data were used to reconstruct the threedimensional structures.
We further applied the RPR method to cells 4–10 of ref. 5 in 250 kbp resolution (Supplementary Figure 2) as well as cells 1–10 of ref. 5 in 50 kbp resolution (see Fig. 5 and Supplementary Figure 3). Despite some common features such as an open loop at a position of 30 Mb in all cells, careful inspection reveals some celltocell variability in the open loops seen at positions of 70 or 145 Mb. We further confirmed that application of the RPR method to the ensemble HiC data could reproduce the similar features (Figs 4 and 5, and Supplementary Figure 3).
There is some celltocell variability among these cells. To demonstrate the celltocell variability, we focused on the similarity and difference between the local structures in the scale of 1 Mb by taking “3D correlation coefficients” (see Supplementary Texts B for the details of the calculations). Figure 6a shows the correlation of local structures between every combinational pair of 10 individual cells, while Fig. 6b shows the correlation between the ensemble and each individual cell. The celltocell variability is small when the reconstruction for each cell agreed with that of the ensemble (see Fig. 6a,b). This agreement means that from the analysis of the mass of cells, the 3D structure from ensemble HiC remarkably shows the common features that were observed within most of the ten individual cells of single cell HiC. On the other hand, structural fluctuation was observed at other parts of chromosomes where the topology for each cell is different from the ensemble. This structural fluctuation suggests the potential of singlecell HiC analysis to show the chromosome structure variability that cannot be seen in the ensemble HiC. Indeed, we can show how the topological structures are similar and different among individual cells in Fig. 6c.
The Th1 ensemble HiC data and singlecell HiC data were acquired from the GEO database (http://www.ncbi.nlm.nih.gov/geo/, accession GSE48262; GSM1173492 for the ensemble HiC data, and from GSM1173493 to GSM1173502 for the 10 single cell HiC data).
Discussions
The RPR method is qualitatively different from other ways^{2,7,20,21,22,23,24,25,26,27,28} of chromosomal structural reconstruction (except for that of Paulsen et al.^{17}) in terms of usage of singlecell HiC data without any consideration on read counts^{29}. The method of Paulsen et al.^{17} is reported as an analysis appropriate for singlecell HiC data. While the method of Paulsen et al.^{17} only uses information of whether two segments of chromosomes are close or not for connecting an edge by just using a constant local distance, the RPR method can reflect information from distant segments when connecting two segments by an edge: it refines how close they are by reconstructing the local distance by Eq. (2), which is strongly related to the Jaccard coefficient^{30} (see also Supplementary Texts A). Thus, the RPR method potentially enables us to reproduce the metric space and thus rationally estimate threedimensional chromosome structures especially from very low coverage HiC data such as singlecell HiC. For overcoming the rough approximation at each local distance, the method of Paulsen et al.^{17} intentionally puts less weights on the distantly located pairs to reduce the influence of noise since a long path composed by many edges is less reliable. On the other hand, the RPR method does not have currently special mechanisms to reduce such noise since the RPR method is tolerant to noise and lack of information as shown in Supplementary Figure 1: it is because some small fractions of bit flips or missing bits due to such noise cause small differences in the evaluations of Eq. (2) and thus the topological structures reconstructed. Hence, if we combine the noise reducing mechanism of Paulsen et al.^{17} for long paths with the weighted graph of the RPR method, one may be able to reproduce the 3D structure of chromosomes more accurately. This direction is an open remaining problem.
Another important advantage of the RPM method is that it provides a unique threedimensional reconstruction. This is because (i) the proposed method coverts a contact map to a distance matrix in a deterministic way, and (ii) the classical multidimensional scaling implemented in Matlab gives a unique configuration for a spatial arrangement of points given their distance matrix.
In addition, the RPR method has other advantages in comparison with the conventional methods^{2,7,17,20,21,22,23,24,25,26,27,28} such as methods by Lense et al.^{7} and Paulsen et al.^{17}. First, the RPR method only needs the information of the contact map, whereas the method by Lense et al.^{7} requires additional information such as the frequencies of pairs or read counts, hence the RPR method is suitable for HiC data with a few read counts. Second, the proposed method is based on mathematical proofs^{13,14} and seems to work well with larger datasets. For example, the RPR method functioned well, even for a largescale dataset with over 10,000 points^{3}, whereas the example with Lense’s method handled only up to 1,000 points^{7}. Third, the proposed method could be carried out within a reasonable time on an easily available computer. For example, it takes less than an hour to reconstruct the 3D chromosome structure for a whole set of chromosomes of a single cell at 250kb resolution if we replace the Dijkstra method with the Johnson method^{31}, even when we use a conventional computer with 2 × 2.66 GHz 6Core Intel Xeon and 64 GB memory with codes implemented on Matlab (Supplementary Codes). To make this comparison reasonable and fair, we used the codes provided by the link of ref. 17, and reproduced the structure of X chromosome in 50 kbp resolution using the same above computer. Then we found that the method of ref. 17 needed 764 seconds, while the proposed RPR method needed 157 seconds and reproduced the typical topology appropriately (Here we replaced the Dijkstra method with the Johnson^{31} method implemented in Matlab’s graphallshortestpaths function. This function for finding the shortest path lengths for all pairs of nodes is originally used in the codes of ref. 17 and thus we decided to employ it for speeding up; See Supplementary Table 1 and Supplementary Figure 4 for further comparisons). Thus, the proposed method is expected to provide a breakthrough for the reconstruction of the 3D chromosomal structure from HiC data.
Additional Information
How to cite this article: Hirata, Y. et al. Threedimensional reconstruction of singlecell chromosome structure using recurrence plots. Sci. Rep. 6, 34982; doi: 10.1038/srep34982 (2016).
References
LiebermanAiden, E. et al. Comprehensive mapping of longrange interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Duan, Z. et al. A threedimensional model of the yeast genome. Nature 465, 363–367 (2010).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Nagano, T. et al. Single cell HiC reveals celltocell variability in chromosome structure. Nature 502, 59–64 (2013).
Nagano, T. et al. Singlecell HiC for genomewide detection of chromatin interactions that occur simultaneously in a single cell. Nat. Protoc. 10, 1986–2003 (2015).
MartiRenom, M. A. & Mirny, L. A. Bridging the resolution gap in structural modeling of 3D genome organization. PLoS Comp. Biol. 7, e1002125 (2011).
Lesne, A., Riposo, J., Roger, P., Cournac, A. & Mozziconacci, J. 3D genome reconstruction from chromosomal contacts. Nat. Methods 11, 1141–1143 (2014).
Eckmann, J. P., Kamphorst, S. O. & Ruelle, D. Recurrence plots of dynamical systems. Europhys. Lett. 4, 973–977 (1987).
Marwan, N., Romano, M. C., Thiel, M. & Kurths, J. Recurrence plots for the analysis of complex systems. Phys. Rep. 438, 237–329 (2007).
Thiel, M., Romano, M. C. & Kurths, J. How much information is contained in a recurrence plot? Phys. Lett. A 330, 343–349 (2004).
Hirata, Y., Horai, S. & Aihara, K. Reproduction of distance matrices and original time series from recurrence plots and their applications. Eur. Phys. J. Special Topics 164, 13–22 (2008).
Tanio, M., Hirata, Y. & Suzuki, H. Reconstruction of driving forces through recurrence plots. Phys. Lett. A 373, 2031–2040 (2009).
Hirata, Y., Komuro, M., Horai, S. & Aihara, K. Faithfulness of recurrence plots: a mathematical proof. Int. J. Bifurcation Chaos 25, 1550168 (2015).
Khor, A. & Small, M. Examining knearest neighbour networks: Superfamily phenomena and inversion. Chaos 26, 043101 (2016).
Dijikstra, E. W. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959).
Gower, J. C. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966).
Paulsen, J., Gramstad, O. & Collas, P. Manifold based optimization for singlecell 3D genome reconstruction. PLoS Comp. Biol. 11, e1004396 (2015).
Lorenz, E. N. Deterministic nonperiodic flow. J. Atmos. Sci. 20, 130–141 (1963).
Rössler, O. E. An equation for continuous chaos. Phys. Lett. A 57, 397–398 (1976).
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures reavealed by tethered chromosome conformation captures and populationbased modeling. Nat. Biotechnol. 30, 90–98 (2011).
Rousseau, M., Fraser, J., Ferraiuolo, M. A., Dostie, J. & Blanchette, M. Thredimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling. BMC Bioinformatics 12, 414 (2011).
Baù, D. & MartiRenom, M. A. Genome structure determination via 3Cbased data integration by the integrative modeling platform. Methods 58, 300–306 (2012).
Meluzzi, D. & Arya, G. Recovering ensembles of chromatin conformations from contact probabilities. Nucleic Acids Res. doi:10.1093/nar/gks1029 (2012).
Zhang, Z., Li, G., Toh, K. C. & Sung, W. K. 3D chrommosome modeling with semidefinite programming and HiC data. J. Comput. Biol. 20, 831–846 (2013).
Peng, C. et al. The sequencing bias relaxed characteristics of HiC derived data and implications for chromatin 3D modeling. Nucleic Acids Res. 41, e183 (2013).
Hu, M. et al. Bayesian inference of spatial organization of chromosomes. PLoS Comput. Biol. 9, e1002893 (2013).
Varoquaux, N., Ay, F., Noble, W. S. & Vert, J. P. A statistical approach for inferring the 3D structure of the genome. Bioinformatics 30, i26–i33 (2014).
Giorgetti, L. et al. Predictive polymer modeling reveals coupled fluctuations in chromosome conformation and transcription. Cell 157, 950–963 (2014).
Serra, F. et al. Restraintbased threedimensional modeling of genomes and genomic domains. FEBS Lett. 589, 2987–2995 (2015).
Levandowsky, M. & Winter, D. Distance between sets. Nature 234, 34–35 (1971).
Johnson, D. B. Efficient algorithms for shortest paths in sparse networks. Journal of the ACM 24, 1–13 (1977).
Acknowledgements
We appreciate Dr. Jonas Paulsen for making their codes of ref. 17 available online. In addition, we thank Mr. Keita Oda for helping us to fill in the gaps between the applied mathematicians and molecular biologists. Moreover, we are grateful to Prof. Kazunori Yamaguchi for discussing how to speed up our algorithm. This research was supported by the Platform Project for Supporting Drug Discovery and Life Science Research (Platform for Dynamic Approaches to Living Systems) from the Ministry of Education, Culture, Sports, and Technology (MEXT) and the Japan Agency for Medical Research and Development (AMED). The research of K.A. was also supported by JSPS KAKENHI Grant number 15H05707.
Author information
Affiliations
Contributions
Y.H., A.O., K.O. and K.A. designed the research. Y.H. implemented the algorithm and analyzed experimental data. A.O. produced most of the figures. Y.H., A.O., K.O. and K.A. wrote the paper.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Electronic supplementary material
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Hirata, Y., Oda, A., Ohta, K. et al. Threedimensional reconstruction of singlecell chromosome structure using recurrence plots. Sci Rep 6, 34982 (2016). https://doi.org/10.1038/srep34982
Received:
Accepted:
Published:
Further reading

Recurrence plots for characterizing random dynamical systems
Communications in Nonlinear Science and Numerical Simulation (2021)

Threedimensional chromatin ensemble reconstruction via stochastic embedding
Structure (2021)

Computational approaches for inferring 3D conformations of chromatin from chromosome conformation capture data
Methods (2020)

Computational methods for predicting 3D genomic organization from highresolution chromosome conformation capture data
Briefings in Functional Genomics (2020)

Nonlinear and NonStationary Detection for Measured Dynamic Signal from Bridge Structure Based on Adaptive Decomposition and Multiscale Recurrence Analysis
Applied Sciences (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.