Abstract
The spatiotemporal organization of DNA replication produces a highly robust and reproducible replication timing profile. Sequencing-based methods for assaying replication timing genome-wide have become commonplace, but regions of high repeat content in the human genome have remained refractory to analysis. Here, we report the first nearly-gapless telomere-to-telomere replication timing profiles in human, using the T2T-CHM13 genome assembly and sequencing data for five cell lines. We find that replication timing can be successfully assayed in centromeres and large blocks of heterochromatin. Centromeric regions replicate in mid-to-late S-phase and contain replication-timing peaks at a similar density to other genomic regions, while distinct families of heterochromatic satellite DNA differ in their bias for replicating in late S-phase. The high degree of consistency in centromeric replication timing across chromosomes within each cell line prompts further investigation into the mechanisms dictating that some cell lines replicate their centromeres earlier than others, and what the consequences of this variation are.
Similar content being viewed by others
Introduction
Eukaryotic DNA replication initiation is organized in space and time, reflecting a reproducible DNA replication-timing program1. In general, late replication appears to be associated with a more repressive chromatin state: late-replicating regions tend to localize to the nuclear periphery2,3 and to broadly associate with the condensed “B” compartment in chromatin conformation capture assays4,5. Likewise, genes in late-replicating regions often have lower expression6,7, with corresponding histone methylation8,9 and deacetylation8,10, than genes in early-replicating regions. Constitutive heterochromatin, which is gene-poor and highly-condensed, is often described to be late replicating11,12,13, although direct visualization by microscopy has classified five sequential nuclear localization patterns of nascently-replicated DNA, with euchromatic replication primarily occurring during the first wave2. While O’Keefe et al.2 used in situ hybridization probes to demonstrate that centromeric α-satellite DNA co-localized with nascent DNA in the third wave of replication, which heterochromatin replicates in the other waves remains uncharacterized. These results suggest that heterochromatin replication timing is more complicated than currently appreciated, and potentially points to the existence of distinct heterochromatin subtypes that differ in their replication timing.
Existing methods for measuring replication timing at genome scale14 are sequencing-based, making them reliant on the quality of reference genome assemblies. Notably, the current human reference genome (GRCh38/hg38) contains 151 Mb of unresolved gaps, represented as multi-megabase arrays of unknown sequence15. Thus, these regions—which include large pericentromeric regions on chromosomes 1, 9, and 16 and the entire p-arms of the five acrocentric chromosomes (chr13, chr14, chr15, chr21, chr22)—have been refractory to whole-genome analyses, including those of replication timing. In addition, hg38 contains statistically modeled sequences for the centromeric α-satellite DNA, which were designed as decoys for sequence alignment rather than to reflect the true linear sequence of these arrays16.
Centromeres, in particular, have been suggested to complicate the general association of heterochromatin with late replication timing: centromeres replicate in early S phase across multiple yeast species17,18,19,20 and in mid S phase in maize21. In humans, centromeric replication timing has primarily been reported as late replicating22,23,24, although it has also been reported to occur in mid S phase2. We previously reported25 that the centromeric sequence models in hg38 enabled preliminary analysis of replication timing for the majority of human centromeres by whole-genome sequencing. We found consistent evidence of replication-timing peaks within centromeric regions, suggesting that centromeres contain replication origins. We further demonstrated that centromeric replication occurs during mid-to-late S-phase and that its timing is highly divergent among cell lines. However, because the decoy sequences in hg38 were not linear assemblies of the centromeres, we were unable to analyze the precise locations of these peaks.
Here, we report nearly-gapless telomere-to-telomere replication timing profiles across all autosomes and the X chromosome. Using the telomere-to-telomere human genome assembly T2T-CHM13, recently published by the Telomere-to-Telomere Consortium15, we provide the first report of replication timing of constitutive heterochromatin in the context of the whole genome. The linear sequences for the centromeres in this genome assembly further enabled us to revisit and reaffirm our previously conclusions based on hg38, while also analyzing the locations of centromeric replication initiation sites.
Results and discussion
Telomere-to-telomere replication timing profiles
In our prior analysis25, we generated replication timing profiles for five cell lines—the apparently healthy lymphoblastoid cell line GM12878, the embryonic kidney cell line HEK293T, the ovarian carcinoma cell line A2780, and the breast cancer cell lines HCC1143 and HCC1954—by whole-genome sequencing of G1- and S-phase populations isolated by fluorescence-activated cell sorting (FACS). The G1-phase fraction was used to define variable-size uniform-coverage genomic windows, accounting for sequencing biases and copy-number variants, and then sequencing read depth was assessed for the S-phase fraction. After S/G1 normalization, fluctuations in S-phase read depth reflect only the effects of replication timing, such that early-replicating regions are more highly represented relative to late-replicating regions26.
T2T-CHM13 is a gapless human genome assembly for CHM13-hTERT, a telomerase reverse transcriptase-transformed cell line derived from a complete hydatidiform mole with a stable 46, XX karyotype15. Hydatidiform moles are formed during fertilization and contain only DNA from the sperm; thus CHM13-hTERT is homozygous, reducing the complexity of genome assembly. T2T-CHM13 was assembled from long-read PacBio circular consensus sequencing and polished with a combination of other short- and long-read sequencing methods. To assess whether this new assembly could be used to study the replication timing of heterochromatin, we generated replication timing profiles from the same sequencing libraries, re-aligning the sequencing reads for each cell line to T2T-CHM13. The resulting replication timing profiles were nearly gapless, with only the rDNA loci remaining as unresolved (Fig. 1). (We note that CHM13-hTERT has an XX karyotype, as do all five cell lines studied. Thus, we did not consider the Y chromosome.) We validated these replication-timing profiles by comparison to the hg38-based replication timing profiles, using the UCSC Genome Browser liftOver tool to convert between hg38 and T2T-CHM13 coordinates. The profile for each cell line was virtually identical (r > 0.999) between genome builds for regions that could be successfully “lifted over” (i.e., the non-shaded regions in Fig. 1; 94.14% of the genome). We note that this approach for inferring the replication timing of heterochromatic regions necessitated the analysis of a G1 control sample and was not amenable to FACS-free inference of replication timing from genome sequence data27 (Supplementary Fig. 1).
Our telomere-to-telomere profiles revealed the replication timing of several large regions previously excluded from genomic analysis. This included the entire p-arms of the acrocentric chromosomes (except for the rDNA loci) and the large pericentromeric satellite arrays on chromosomes 1, 9, and 16. The replication timing profiles in each of these regions showed similar structure to the profiles for other genomic regions, with distinct local maxima and minima of varying amplitudes (Fig. 2a, b; Supplementary Fig. 2). Annotation of these new sequences28 indicated that these regions include several multi-megabase repeat arrays of distinct satellite sequences, including human satellite 1 (HSat1; 4.9 Mb on chr13p), human satellite 2 (HSat2; 13.2 Mb on chr1q, 12.7 Mb on chr16q), human satellite 3 (HSat3; 27.6 Mb on chr9, 8 Mb on chr15p), and β-satellite (1.9 Mb on chr22p). Within these larger satellite arrays, HSat1 appeared to replicate in mid-S phase, while HSat2 and HSat3 were later-replicating; we further characterize the replication timing of each satellite family, across all family members genome-wide, below.
Next, we visualized the centromeric regions. Using hg38, we previously reported that each centromeric region contains multiple replication timing peaks and that centromeric replication timing is not extremely late relative to the rest of the genome25. Although the linear centromeric sequences in T2T-CHM13 completely replace the decoy sequences in hg38, these results were reproduced here (Fig. 3; Supplementary Fig. 3; Fig. 4c). Additionally, we were able to meaningfully identify the locations of replication timing peaks within centromeric regions and to analyze their dynamics, as we present below (Fig. 5). Furthermore, satellite repeat elements within T2T-CHM13 centromeric regions are well-annotated28, enabling us to characterize the replication timing of the rapidly-evolving centromere-specific α-satellite DNA, which is present as canonical higher-order repeat arrays (HORs), divergent higher-order repeat arrays, and α-satellite monomers (presented in Fig. 4). Although many of the centromeric regions contain multiple HORs, only a subset is observed to bind kinetochore proteins and function in active centromere assembly29.
Replication timing bias of repetitive sequence elements
Between the acrocentric p-arms and the centromeric regions, T2T-CHM13 adds 395 Mb of densely annotated repeat-rich sequence whose replication timing has not been analyzed. Many of the annotated satellite sequences are relatively short (median: 7.25 Kb) and neighbored by sequences of other satellite families (Fig. 4a). Thus, we were interested to know whether these satellite families differed from one another in their replication timing: persistent patterns in replication timing of a family across multiple chromosome contexts could reflect some underlying property that controls when it replicates.
Indeed, satellite families did differ in both the median and range of replication timing values observed. Replication timing values for non-satellite sequence in these regions (annotated as “ct”) ranged from very early to very late, with a median somewhat later than the genome average (RT = − 0.25 vs. − 0.03; Fig. 4b). In contrast, each of the satellite sequence families was biased toward late replication—although none were exclusively late replicating (Fig. 4c). Notably, α-satellite HORs replicated earlier on average than human satellite 2 (HSat2) and human satellite 3 (HSat3), but later than human satellite 1 (HSat1). This is consistent with the notion that the active centromere is earlier replicating than its surrounding context, potentially to facilitate kinetochore loading onto both sister chromatids at the appropriate time during S-phase. Furthermore, late replication of HSat2 and HSat3, evolutionarily related satellites that form large blocks of constitutive heterochromatin, suggests that they may comprise the later waves of replication observed by microscopy2.
Replication dynamics within centromeric regions
Identifying the locations of replication timing peaks within centromeric regions allowed us to next ask about replication dynamics within these regions. We used two metrics to assess replication dynamics: the distance between consecutive replication timing peaks as a proxy for inter-origin distance, and the slope between replication timing peaks and valleys as a proxy for replication fork speed. We observed that inter-peak distances were slightly longer in centromeric regions relative to the rest of the genome (median: 0.65 Mb in centromeric regions vs. 0.51 Mb genome-wide; Fig. 5a) and replication-timing slopes were similar (median: 0.89/Mb in centromeric regions vs. 0.88/Mb genome-wide; Fig. 5b). While looking specifically within α-satellite HORs, these trends were more pronounced (Fig. 5c, d). This could suggest that the active centromere poses a barrier to replication initiation and/or elongation, resulting in fewer origins firing and/or slower replication progression through these satellite arrays. However, there was substantial overlap between the distributions in all comparisons, indicating that many individual origins have similar dynamics in centromeric and non-centromeric regions. Thus, we favor the explanation that these differences are an artifact of the relatively sparser sequencing coverage of centromeric regions, resulting in an undercounting of centromeric peaks.
Centromeric replication timing varies consistently among cell lines
Finally, we considered differences between the five cell lines analyzed. Replication timing biases of individual satellite repeat families were consistent across cell lines (Fig. 6a). Likewise, inter-origin distances (Fig. 6b) and replication timing slopes (Fig. 6c) were comparable. We had previously observed that there were differences in average centromeric replication timing between these cell lines, such that the average centromeric region in A2780 and HEK293T was earlier-replicating and the average centromeric region in HCC1954 and HCC143 was later-replicating25. Even though the replication timing profiles in these regions could not be “lifted over” between hg38 and T2T-CHM13, this trend was again observed in the T2T-CHM13 profiles (Fig. 6d). Using T2T-CHM13, we were further able to analyze replication timing of individual centromeric regions in each cell line. We found that the trend observed on average reflected a persistent pattern across chromosomes within each cell line, rather than being driven by the replication timing of a subset of centromeres (Fig. 6e).
Taken together, our results indicate that the T2T-CHM13 genome assembly provides a reliable tool for inference of nearly gapless telomere-to-telomere human replication timing profiles. These newly profiled regions confirm that heterochromatin is typically (but not exclusively) late replicating and reveal differences in replication timing biases of satellite repeat families. Linear centromeric reference sequences enabled us to further confirm our prior findings that centromeres replicate in mid-to-late S phase, are not unusually late replicating relative to the rest of the genome, and that their timing of replication differs between cell lines. One biological mechanism that could potentially shape differences between cell lines is differential recruitment of the centromere-specific histone H3 variant CENP-A. Variation in HOR array length and sequence divergence has been shown to influence the competency of centromeric regions to recruit CENP-A30, and in vitro experiments suggest that depletion of CENP-A during S-phase results in replication fork stalling specifically at centromeres31. Thus, sequence and copy-number variation at centromeric regions among cell lines may alter the replication timing of individual chromosomes. However, by comparing centromeric regions within the same cell line, we demonstrate that earlier centromeric replication timing appears to be a global phenomenon impacting all chromosomes. An intriguing possibility is that centromeric replication is coordinated across chromosomes, perhaps by their nuclear localization: centromeres are strongly enriched for intrachromosomal interactions in budding yeast32 and centromere location within the nucleus has been implicated in the maintenance of pluripotency in human embryonic stem cell lines33. In that scenario, advancing the replication timing of one centromere could have the impact of altering global centromeric replication timing. To our knowledge, such a mechanism has yet to be described. Likewise, the consequences of divergent centromeric replication timing between cell lines remain unclear. Telomere-to-telomere replication timing profiles provide both the impetus and the tools for investigating these questions further.
Methods
Preparation of whole genome sequence data
All sequence data analyzed in this study were previously published in Massey et al.25. Tissue culture, fluorescence-activated cell sorting, library preparation, and sequencing are detailed in that publication.
Sequencing reads were re-aligned to the human genome assembly T2T-CHM13 v1.1 with the Burrows-Wheeler Aligner maximal exact matches (BWA-MEM) algorithm (bwa v0.7.13). Sequence annotations are from Altemose et al.28 and were downloaded from the UCSC Genome Browser (University of California, Santa Cruz; “cenSatAnnotation” track).
Replication timing profiles
Replication timing profiles were inferred by the S/G1 method described in Koren et al. (2012)26. Briefly, variable-size genomic bins were defined such that each bin had uniform coverage (200 reads) in the G1-phase library for a given cell line. Per-bin coverage was calculated for the corresponding S-phase library. The resulting profile was smoothed using a cubic smoothing spline (MATLAB function csaps, smoothing parameter 1 × 10–16), and normalized to an autosomal mean of 0 and standard deviation of 1.
Data availability
Sequence data analyzed in this study are available from the Sequence Read Archive (SRA) under accession number PRJNA419407. The reference assembly T2T-CHM13 v1.1 was downloaded from GitHub (https://github.com/marbl/CHM13). Chain files for liftOver (grch38.t2t-chm13-v1.1.over.chain; t2t-chm13-v1.1.grch38.over.chain) were obtained from the UCSC Genome Browser (https://t2t.gi.ucsc.edu/chm13/dev/t2t-chm13-v1.1/downloads/).
References
Fragkos, M., Ganier, O., Coulombe, P. & Mechali, M. DNA replication origin activation in space and time. Nat. Rev. Mol. Cell Biol. 16, 360–374. https://doi.org/10.1038/nrm4002 (2015).
O’Keefe, R. T., Henderson, S. C. & Spector, D. L. Dynamic organization of DNA replication in mammalian cell nuclei: spatially and temporally defined replication of chromosome-specific alpha-satellite DNA sequences. J. Cell Biol. 116, 1095–1110. https://doi.org/10.1083/jcb.116.5.1095 (1992).
Dimitrova, D. S. & Gilbert, D. M. The spatial position and replication timing of chromosomal domains are both established in early G1 phase. Mol. Cell 4, 983–993. https://doi.org/10.1016/s1097-2765(00)80227-0 (1999).
Ryba, T. et al. Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res. 20, 761–770. https://doi.org/10.1101/gr.099655.109 (2010).
Rivera-Mulia, J. C. et al. Allele-specific control of replication timing and genome organization during development. Genome Res. 28, 800–811. https://doi.org/10.1101/gr.232561.117 (2018).
Farkash-Amar, S. et al. Global organization of replication time zones of the mouse genome. Genome Res. 18, 1562–1570. https://doi.org/10.1101/gr.079566.108 (2008).
Hiratani, I. et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 6, e245. https://doi.org/10.1371/journal.pbio.0060245 (2008).
Ding, Q. et al. The genetic architecture of DNA replication timing in human pluripotent stem cells. Nat. Commun. 12, 6746. https://doi.org/10.1038/s41467-021-27115-9 (2021).
Du, Q. et al. DNA methylation is required to maintain both DNA replication timing precision and 3D genome organization integrity. Cell Rep. 36, 109722. https://doi.org/10.1016/j.celrep.2021.109722 (2021).
Goren, A., Tabib, A., Hecht, M. & Cedar, H. DNA replication timing of the human beta-globin domain is controlled by histone modification at the origin. Genes Dev. 22, 1319–1324. https://doi.org/10.1101/gad.468308 (2008).
Gilbert, D. M. Replication timing and transcriptional control: Beyond cause and effect. Curr. Opin. Cell Biol. 14, 377–383. https://doi.org/10.1016/s0955-0674(02)00326-5 (2002).
Rhind, N. & Gilbert, D. M. DNA replication timing. Cold Spring Harb Perspect. Biol. 5, a010132. https://doi.org/10.1101/cshperspect.a010132 (2013).
Fu, H., Baris, A. & Aladjem, M. I. Replication timing and nuclear structure. Curr. Opin. Cell Biol. 52, 43–50. https://doi.org/10.1016/j.ceb.2018.01.004 (2018).
Hulke, M. L., Massey, D. J. & Koren, A. Genomic methods for measuring DNA replication dynamics. Chromosome Res. 28, 49–67. https://doi.org/10.1007/s10577-019-09624-y (2020).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53. https://doi.org/10.1126/science.abj6987 (2022).
Miga, K. H. et al. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 24, 697–707. https://doi.org/10.1101/gr.159624.113 (2014).
Raghuraman, M. K. et al. Replication dynamics of the yeast genome. Science 294, 115–121. https://doi.org/10.1126/science.294.5540.115 (2001).
Kim, S. M., Dubey, D. D. & Huberman, J. A. Early-replicating heterochromatin. Genes Dev. 17, 330–335. https://doi.org/10.1101/gad.1046203 (2003).
Kim, S. M. & Huberman, J. A. Regulation of replication timing in fission yeast. EMBO J. 20, 6115–6126. https://doi.org/10.1093/emboj/20.21.6115 (2001).
Koren, A. et al. Epigenetically-inherited centromere and neocentromere DNA replicates earliest in S-phase. PLoS Genet. 6, e1001068. https://doi.org/10.1371/journal.pgen.1001068 (2010).
Wear, E. E. et al. Genomic analysis of the DNA replication timing program during mitotic S phase in maize (Zea mays) root tips. Plant Cell 29, 2126–2149. https://doi.org/10.1105/tpc.17.00037 (2017).
Ten Hagen, K. G., Gilbert, D. M., Willard, H. F. & Cohen, S. N. Replication timing of DNA sequences associated with human centromeres and telomeres. Mol. Cell Biol. 10, 6348–6355. https://doi.org/10.1128/mcb.10.12.6348-6355.1990 (1990).
Watanabe, Y., Kazuki, Y., Oshimura, M., Ikemura, T. & Maekawa, M. Replication timing in a single human chromosome 11 transferred into the Chinese hamster ovary (CHO) cell line. Gene 510, 1–6. https://doi.org/10.1016/j.gene.2012.08.045 (2012).
Erliandri, I. et al. Replication of alpha-satellite DNA arrays in endogenous human centromeric regions and in human artificial chromosome. Nucleic Acids Res. 42, 11502–11516. https://doi.org/10.1093/nar/gku835 (2014).
Massey, D. J., Kim, D., Brooks, K. E., Smolka, M. B. & Koren, A. Next-generation sequencing enables spatiotemporal resolution of human centromere replication timing. Genes (Basel) https://doi.org/10.3390/genes10040269 (2019).
Koren, A. et al. Differential relationship of DNA replication timing to different forms of human mutation and variation. Am. J. Hum. Genet. 91, 1033–1040. https://doi.org/10.1016/j.ajhg.2012.10.018 (2012).
Koren, A., Massey, D. J. & Bracci, A. N. TIGER: Inferring DNA replication timing from whole-genome sequence data. Bioinformatics https://doi.org/10.1093/bioinformatics/btab166 (2021).
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eab14178. https://doi.org/10.1126/science.abl4178 (2022).
McNulty, S. M. & Sullivan, B. A. Alpha satellite DNA biology: Finding function in the recesses of the genome. Chromosome Res. 26, 115–138. https://doi.org/10.1007/s10577-018-9582-3 (2018).
Aldrup-MacDonald, M. E., Kuo, M. E., Sullivan, L. L., Chew, K. & Sullivan, B. A. Genomic variation within alpha satellite DNA influences centromere location on human chromosomes with metastable epialleles. Genome Res. 26, 1301–1311. https://doi.org/10.1101/gr.206706.116 (2016).
Giunta, S. et al. CENP-A chromatin prevents replication stress at centromeres to avoid structural aneuploidy. Proc. Natl. Acad. Sci. U.S.A. https://doi.org/10.1073/pnas.2015634118 (2021).
Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367. https://doi.org/10.1038/nature08973 (2010).
Wiblin, A. E., Cui, W., Clark, A. J. & Bickmore, W. A. Distinctive nuclear organisation of centromeres and regions involved in pluripotency in human embryonic stem cells. J. Cell Sci. 118, 3861–3868. https://doi.org/10.1242/jcs.02500 (2005).
Acknowledgements
This work was supported by the National Institutes of Health (DP2-GM123495 to A.K.) and the National Science Foundation (MCB-1921341 to A.K.).
Author information
Authors and Affiliations
Contributions
D.J.M. and A.K. conceptualized the project. D.J.M. performed analyses. D.J.M. and A.K. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Massey, D.J., Koren, A. Telomere-to-telomere human DNA replication timing profiles. Sci Rep 12, 9560 (2022). https://doi.org/10.1038/s41598-022-13638-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-022-13638-8
This article is cited by
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.