The 4D nucleome project

Journal name:
Nature
Volume:
549,
Pages:
219–226
Date published:
DOI:
doi:10.1038/nature23884
Received
Accepted
Published online

Abstract

The 4D Nucleome Network aims to develop and apply approaches to map the structure and dynamics of the human and mouse genomes in space and time with the goal of gaining deeper mechanistic insights into how the nucleus is organized and functions. The project will develop and benchmark experimental and computational approaches for measuring genome conformation and nuclear organization, and investigate how these contribute to gene regulation and other genome functions. Validated experimental technologies will be combined with biophysical approaches to generate quantitative models of spatial genome organization in different biological states, both in cell populations and in single cells.

At a glance

Figures

  1. The 4D Nucleome project.
    Figure 1: The 4D Nucleome project.

    The project encompasses three components. a, Experimental mapping approaches are used to measure a range of aspects of the spatial organization of the genome, including chromatin loops, domains, nuclear bodies and so on. b, Computational and modelling approaches are used to interpret experimental observations and build (dynamic) spatial models of the nucleus. c, Perturbation experiments, for example, using CRISPR–Cas9-mediated genome engineering, are used for functional validation. In these studies chromatin structures are altered, for example, by removing chromatin loops, creating novel loops at defined positions or tethering regulatory components in selected regions to test their architectural function. These perturbation studies can be complemented with functional studies, for example, analysis of gene expression to assess the functional implications of chromatin folding. The nucleus image (a) shows live cell CRISPR labelling of specific loci on human chromosomes 1 and 13 and is provided by H. Ma and T. Pederson.

  2. Modelling the 4D Nucleome.
    Figure 2: Modelling the 4D Nucleome.

    Data obtained with imaging and chromosome conformation capture-based assays can be used for building spatial and dynamic models of chromosomes using two main approaches. In the data-driven approach, experimental data are used directly to generate ensembles of conformations that reproduce the experimental observations. In the de novo approach, ensembles of conformations are built according to known or hypothesized physical or biological processes. Models are then selected based on their agreement with experimental data.

The human genome contains over 20,000 genes and a larger number of regulatory elements. Large-scale studies over the last decade have catalogued these components of our genome and the cell types in which they are active. The ENCODE, Roadmap Epigenome, International Human Epigenome Consortium, EpiGeneSys (http://www.epigenesys.eu/en/) and FANTOM projects1, 2, 3, 4 have annotated thousands of genes and millions of candidate regulatory elements. However, our understanding of the mechanisms by which these elements exert regulatory effects on specific target genes across distances of kilobases, and in some cases megabases, remains incomplete.

The spatial folding of chromosomes and their organization in the nucleus have profound effects on gene expression. For example, spatial proximity is necessary for enhancers to modulate transcription of target genes (for example, refs 5, 6, 7), and clustering of chromatin near the nuclear lamina is correlated with gene silencing and replication timing8, 9. In addition, genome-wide association studies have identified large numbers of disease-associated loci, and the majority of these loci are located in distal, potentially regulatory, noncoding regions (for example, ref. 10). In cancer cells, genomic rearrangements frequently occur and these are at least in part guided by the three-dimensional organization of the nucleus11, 12. These data emphasize the importance of distal elements for gene regulation and suggest an exciting opportunity to uncover the fundamental mechanisms of disease through the mapping of long-range chromatin interactions and three-dimensional genome organization. Therefore, to determine how the genome operates, we need to understand not only the linear encoding of information along chromosomes, but also its three-dimensional organization and its dynamics across time, that is, the ‘4D nucleome’. Concomitantly, we must pursue deeper knowledge of the biophysical and molecular factors that determine genome organization, and how this organization contributes to gene regulation and other nuclear activities. Here we outline the goals and strategies of the 4D Nucleome (4DN) Network. This Network builds on other consortia and efforts focusing on (epi-)genome analysis outlined above and adds spatial and temporal dimensions to explore how the genome is organized inside cells and how this relates to genome function.

The nucleus is not a homogeneous organelle, but consists of distinct nuclear structures and non-chromatin bodies as well as defined chromosomal regions, such as centromeres, telomeres and insulator bodies, that have been shown to cluster with each other and other genomic regions to define distinct nuclear compartments13, 14. Examples of nuclear structures include the nuclear lamina and nuclear pores. Examples of nuclear compartments include the heterochromatic compartment, while examples of nuclear bodies include nucleoli, nuclear speckles, paraspeckles, and Cajal and PML (promyelocytic leukaemia) bodies. Chromosome conformation capture (3C) approaches15, 16 have yielded additional insights by characterizing genome-wide chromatin folding at kilobase resolution6, 17, 18. These studies have shown that the genome is compartmentalized in active and inactive spatial compartments at the scale of the nucleus, and that within each compartment, folding of chromatin fibres brings together loci and regulatory elements that are otherwise separated by large genomic distances. CTCF, the cohesin complex and other DNA-binding proteins, as well as RNAs, have roles in organizing chromatin domains and long-range interactions between DNA loci18, 19, 20, 21, 22, 23, 24. These studies indicate that the genome is intricately organized within the nucleus and that this organization has a critical role in gene regulation and activity.

During the past decade marked innovation in chromosome and nuclear structure analysis has occurred. Genomic approaches for mapping chromatin interactions, such as 3C, 4C (circular 3C, or 3C-on-chip), 5C (3C-carbon copy), Hi-C, and chromatin-interaction analysis by paired-end-tag sequencing (ChIA-PET)16, are yielding genome-wide chromatin interaction maps at unprecedented resolution. Live-cell and super-resolution microscopic approaches, combined with application of new ways (for example, CRISPR–Cas9-based systems) to visualize loci and sub-nuclear structures are beginning to provide detailed views of the organization and dynamics of chromatin inside (living) cells25, 26, 27, 28, 29, 30, 31. There has also been pronounced progress in analysing chromosome structural data, producing structural models for chromosome folding32, 33. However, despite this progress, a comprehensive understanding of the 4D nucleome is still lacking. This is partly due to the fact that different experimental cell systems and approaches are used that together with the absence of shared benchmarks for assay performance have led to observations that cannot be directly compared. Additionally, we currently have limited ability to integrate different data types (for example, chromatin interaction data and imaging-based distance measurements) and lack approaches that can measure and account for cell-to-cell variability in chromosome and nuclear organization. Finally, we lack mechanistic insights into the relationships between chromosome conformation and nuclear processes, including transcription, DNA replication and chromosome segregation. These gaps in our knowledge can be addressed by a highly synergistic, multidisciplinary and integrated approach in which groups with different expertise and knowledge, ranging from imaging and genomics to computer science and physics, work closely together to study common cell systems using complementary methods.

Goals and strategy of the project

The 4DN Network will develop a set of approaches to map the structures and dynamics of the genome and to relate these features to its biological activities. The Network aims to generate quantitative models of nuclear organization in diverse cell types and conditions, including in single cells. Overall, we anticipate that these efforts will lead to new mechanistic insights into how the genome is organized, maintained, expressed and replicated, in both normal and disease states.

The 4DN Network will (1) develop, benchmark, validate and standardize a wide array of technologies to analyse the 4D nucleome; (2) integrate, analyse and model datasets obtained with these technologies to obtain a comprehensive view of the 4D nucleome; and (3) investigate the functional role of various structural features of chromosome organization in transcription, DNA replication and other nuclear processes. These three main components are illustrated in Fig. 1.

Figure 1: The 4D Nucleome project.
The 4D Nucleome project.

The project encompasses three components. a, Experimental mapping approaches are used to measure a range of aspects of the spatial organization of the genome, including chromatin loops, domains, nuclear bodies and so on. b, Computational and modelling approaches are used to interpret experimental observations and build (dynamic) spatial models of the nucleus. c, Perturbation experiments, for example, using CRISPR–Cas9-mediated genome engineering, are used for functional validation. In these studies chromatin structures are altered, for example, by removing chromatin loops, creating novel loops at defined positions or tethering regulatory components in selected regions to test their architectural function. These perturbation studies can be complemented with functional studies, for example, analysis of gene expression to assess the functional implications of chromatin folding. The nucleus image (a) shows live cell CRISPR labelling of specific loci on human chromosomes 1 and 13 and is provided by H. Ma and T. Pederson.

To achieve these objectives, we have defined the following key steps. First, a set of common cell lines will be studied to enable direct cross-validation of data that are obtained with different methods (Table 1). Important criteria include a stable, haplotype-phased and normal karyotype, ease of growth, ease of genome editing and suitability for (live-cell) imaging. Furthermore, given that cell populations are characterized by cell-to-cell variation in their biological state (for example, cell-cycle stage), it will be important to use clonal cell populations that can be synchronized, activated, induced or differentiated in a controlled manner.

Table 1: Common cell lines used by the 4DN Network

Second, standards for data formats and quality will be established so that data can be shared broadly. This includes defining metrics for reproducibility and assessment of the sensitivity, specificity, resolution and precision with which aspects of the 4D nucleome can be measured.

Third, computational and analytical tools will be developed to analyse individual datasets and to integrate, compare and cross-validate data obtained with different technologies. Importantly, they will enable the integration of the diverse datasets necessary to build comprehensive models of the 4D nucleome.

Fourth, genetic, biochemical and biophysical approaches will be developed to measure and perturb the roles of DNA sequences and trans-acting factors (proteins, RNA) in the formation of local and global aspects of the 4D nucleome and their impact on transcription and other nuclear functions.

Fifth, a common vocabulary will be developed to describe nuclear features and biophysically derived principles guiding chromosome folding. This is important, because currently, different structural descriptions and interpretations have been used to describe features detected by different technologies, or even by the same methods. We need better and more precise descriptions of the underlying state of structural features that make up the 4D nucleome, for example, loops and domains, and develop a consistent terminology for when these features are detected by different technologies. This can be achieved by integrated analysis of data that will be obtained with a wide range of technologies that are used and under development by the Network.

A major goal is to compare and integrate the wealth of information that is anticipated to be generated by the Network. This will enable both benchmarking of experimental and computational approaches and better interpretation of what each data type (for example, chromosome conformation capture data on the one hand and imaging on the other hand) reveals about the structure, dynamics and cell-to-cell variation in folding of chromosomes. The Network will analyse a small set of common cell lines (Table 1) and select a set of loci that will be studied using the wide array of technologies already employed or under development in the Network. A joint analysis group with members from across the Network will integrate and analyse this diverse dataset to produce benchmarks for each methodology, produce models that represent the folded state of chromosomes and how this is dynamic in real time and variable between cells, and determine how the chromosomal folding state relates to gene regulation.

Finally, to facilitate rapid dissemination of data to the larger scientific community, a shared database and a public 4DN data browser will be established which includes all data, detailed protocols, engineered cell lines and reagents used across the Network.

Structure of the 4DN Network

The 4DN Network encompasses several related efforts (http://www.4dnucleome.org/). First, six centres make up the Nuclear Organization and Function Interdisciplinary Consortium (NOFIC). These centres will develop genomic and imaging technologies, and implement computational models to understand the 4D nucleome. NOFIC centres will work together with other components of the Network to benchmark experimental and computational tools, and to identify the most appropriate repertoire of methods to study the 4D nucleome. These studies will be combined with structural and functional validation of observations and models. Ultimately, the NOFIC aims to deliver integrated approaches that can be used for the generation of a first draft of a model of the 4D nucleome.

Second, ongoing technological development is addressed by the 4DN Network in three ways. (1) New genomic interaction technologies will be developed to study the 4D nucleome at the single-cell level, to analyse the roles of RNA in chromatin architecture and to engineer new chromatin interactions. (2) New imaging and labelling methods are developed to visualize the genome at a high resolution, in live cells as well as in tissues, and in relation to genome activity. Chromatin dynamics will be assayed at high resolution over time scales of seconds to minutes (for example, mitotic compaction, transcription), hours (cell cycle) and days (differentiation). (3) New methods will be developed to probe the DNA, RNA and protein composition of subnuclear structures such as the nuclear envelope and the nucleolus.

Third, a Data Coordination and Integration Center (DCIC; http://dcic.4dnucleome.org/) stores all the data that will be generated by the Network and coordinates data analysis. The DCIC will maintain a website to share data and models with the Network and the larger scientific community. An Organizational Hub (OH) coordinates activities across all 4DN centers and teams, manages the 4DN Opportunity Pool (4DN-OP) of funds, and maintains the 4DN web portal (https://www.4dnucleome.org/), which releases all 4DN network generated resources including data (through DCIC), experimental protocols, data analysis protocols, software, cell line information, and educational materials. OH will also release 4DN-OP grant opportunities and application procedures through the web portal.Finally, a 4DN Network Outreach/Education Working Group works in collaboration with the OH to increase the visibility of the 4DN Network and its associated resources, and foster interactions and collaborations with the larger biomedical community.

Research plans

The Network uses and develops a wide range of experimental technologies to study the organization of the genome and the nucleus, and a set of computational approaches to analyse the data and to start to build models of the 4D nucleome. Further experiments include testing the causal roles and functional consequences of chromosome folding for genome regulation. Below we describe these efforts in more detail.

Genomic technologies to reveal the 4D nucleome

3C technologies have been developed to examine long-range interactions across the genome15, 16. Genome-wide 3C technologies, for example, Hi-C, have revealed patterns of interactions that define genome structures at various resolutions, including loops and topologically associating domains (TADs)17, 18, 34, 35. TADs can be hundreds of kilobases in size, often containing several genes and multiple enhancers, at least some of which appear to interact by looping mechanisms. The ChIA-PET method provides a finer resolution to detect structures defined by architectural proteins, such as CTCF and cohesin, as well as enhancer–promoter interactions associated with RNAPII and other transcription factors6. Furthermore, genome-wide mapping at base-pair resolution to detect haplotype specific interactions are in progress, which will enable the connection of chromatin topology to the vast genetic information regarding complex traits and diseases. The Network will continue to develop 3C-based technologies, including genome-wide methods that enable the exploration of higher-order (beyond pairwise) DNA contacts, detection of chromatin interactions in (thousands of) individual cells36, and mapping of RNA–DNA interactions (Table 2).

Table 2: Genomic technologies currently in use or in development in the 4DN Network

A limitation of current 3C methods is that they depend on a single crosslinker, formaldehyde, which has known biases in the type of residues it can crosslink. Because of the nature of formaldehyde, which is known to polymerize and crosslink molecules across a large range of distances, this approach lacks precise distance information. The Network will explore bivalent photo-activated crosslinkers that are separated by linkers of defined length and flexibility.

Imaging the 4D nucleome

4DN investigators will develop and integrate imaging platforms that enable visualizing the dynamics, interactions and structural organization of the nucleus at unprecedented temporal and spatial resolutions (Table 3). Each of these approaches has unique and complementary abilities for the analysis of different aspects of genome organization. In particular, platforms enabling live-cell imaging allow the dynamics of select chromatin regions and nuclear features to be studied in real time (seconds to hours).

Table 3: Imaging technologies currently in use or in development in the 4DN Network

Standard and high-throughput fluorescent in situ hybridization (FISH) using oligonucleotide probes or guide RNA-mediated recruitment of fluorescently labelled dCas9 (CASFISH31) in fixed cells will be exploited to image genomic interactions over different spatial distances in different cell types and states. These imaging tools will have an important role in benchmarking, validating and complementing data obtained with genomic and proteomic mapping technologies. CRISPR–dCas9 FISH in live cells29, 30, 38 and other live-cell imaging approaches (Table 3) will be used to assay the dynamic behaviour of particular chromatin regions and/or nuclear structures in real time.

New technologies will be developed to label DNA, RNA and proteins that occur in proximity of specific nuclear bodies. These proximity mapping technologies include the use of horseradish peroxidase (HRP)-labelled antibodies for tyramide signal amplification–sequencing (TSA–seq) (Table 2), APEX (engineered ascorbate peroxidase39) for electron microscopy and live cell proteomic fingerprinting and the photosensitizer, Killer Red40, for free radical generation within nuclear microenvironments. Genome-editing technologies will also be applied to tag a subset of key genomic loci, loops, TADs and potentially newly discovered structures to help visualize these moieties and document their interactions with other nuclear regions in live cells.

Super-resolution microscopy, single-molecule tracking techniques and multiplex fluorescent/chemical tags will be used in living cells to determine the dynamic interactions, diffusion and motion of fluorescently labelled proteins, non-coding RNA (ncRNA) and genomic loci (Table 3). These live-cell imaging approaches are expected to provide information regarding the search mechanism, binding and residence time of DNA and protein interactions and will also be used to validate and complement genomic methods used by the Network.

Soft X-ray tomography (SXT41) will be used to visualize the 3D organization of chromatin in nuclei of cells in the native state (cryo-immobilized). SXT will be used to directly measure chromatin compaction, for example, in relation to sub-nuclear position, at different stages of the cell cycle, during differentiation and in different cell types. Correlated microscopy approaches will be used to augment ultrastructural data with molecular localization information. Cryogenic fluorescence tomography (CFT42) will be used to precisely locate molecules in three-dimensional reconstructions of intact cells in their native state that are imaged using SXT.

Members of the Network will develop new electron microscopy (EM) technologies that enable the local and global structural organization of chromatin to be visualized as a continuum from nucleosome to Mb scale in both interphase and mitotic cells. One such method, chrom-EMT43, will be combined with new genetic tags and nanoparticle-labelling technologies to develop the electron microscopy equivalent of ‘multi-colour’ fluorescence.

The development of automated imaging analysis pipelines and data standards will be important to extract the maximum structural information possible from these datasets. Further development of software for analysing, annotating and archiving imaging data, together with implementation of new approaches for correlating imaging and genomics datasets are major goals of the 4DN Network (see below).

Nuclear bodies and non-chromatin structures

The nucleus consists of distinct nuclear structures, such as the nuclear lamina and nuclear pores, chromatin-associated bodies, such as nucleoli that are initiated at specific genomic loci, as well as non-chromatin bodies, such as nuclear speckles, and PML bodies44, 45. Increasing evidence indicates that specific genomic regions associate with these structures, suggesting that these chromosomal associations may have a functional role in regulating genome function8, 46, 47.

Goals of the 4DN Network further include development of new mapping methodologies to measure the genome-wide molecular interaction frequency and cytological distance of chromosome loci to major nuclear compartments, including the nuclear lamina, nuclear pores, nuclear speckles, nucleoli and pericentric heterochromatin (Table 2). Concurrently, new and improved technologies will be developed, including localized APEX-mediated protein biotinylation48, fractionation by cryomilling and RNA antisense purification (RAP49, 50), to catalogue and measure the protein and RNA components of these nuclear compartments, as well as both optogenetic and degron-based approaches to alter or disrupt sub-nuclear bodies and compartments. Functional mapping approaches based on replicated DNA sequencing (Repli–seq)9 and TRIP (thousands of reporters integrated in parallel)51 will provide genome-wide correlations of DNA-replication timing and effects of chromosome position on transcription and RNA processing that can be correlated with these structural maps (Table 2). New imaging approaches will be developed to correlate chromosome and nuclear compartment dynamics with changes in DNA-replication timing, transcriptional activation and other functional states (Table 3). Computational analyses of these genome mapping data from several cell types will be aimed at identifying possible cis- and trans-determinants of nuclear compartmentalization.

Modelling the 4D Nucleome

In parallel with the emergence of increasingly powerful experimental methods has been the development of computational approaches for modelling the spatial organization of the genome. There are at least two major computational approaches for modelling genome architecture on the basis of experimental data: data-driven and de novo approaches33 (Fig. 2). Data-driven approaches directly use experimental data (Hi-C, imaging, and so on) to produce an ensemble of conformations that best match an experimentally observed set of contact probabilities and distances52, 53. De novo modelling, on the other hand, produces ensembles of conformations that result from known or hypothesized physical or biological processes, and tests whether these ensembles are consistent with features of experimental contact frequency maps and imaging data (for example, refs 19, 54). Such de novo models can suggest specific molecular mechanisms and principles of chromosome organization, and can be predictive of chromosome dynamics and therefore can go far beyond the experimental data33.

Figure 2: Modelling the 4D Nucleome.
Modelling the 4D Nucleome.

Data obtained with imaging and chromosome conformation capture-based assays can be used for building spatial and dynamic models of chromosomes using two main approaches. In the data-driven approach, experimental data are used directly to generate ensembles of conformations that reproduce the experimental observations. In the de novo approach, ensembles of conformations are built according to known or hypothesized physical or biological processes. Models are then selected based on their agreement with experimental data.

There are several challenges and promises of current modelling approaches. The first is in a wide diversity of technologies that capture complementary aspects of genome organization: contact frequencies, distances, proximities to various nuclear bodies, and so on. Relationships between these data can be complicated: for example, contact frequency is distinct from an average spatial distance, possibly creating seemingly paradoxical relationships between Hi-C and FISH55. Current modelling approaches, however, can systematically integrate a variety of data to generate comprehensive structural and dynamic models of the 4D nucleome. Such models can be validated against data not used for model selection, for example, predicting dynamics from static data and testing using live imaging.

Second, most current genomic methods yield data from ensembles of thousands to millions of cells, obscuring structural heterogeneity that exists among single cells. A number of groups within the Network are developing methods for generating data from large numbers of single cells, which will present new computational challenges for the integration with current modelling approaches36. It is possible that some of these methods will yield functional data on these same single cells (that is, Hi-C and RNA sequencing (RNA-seq), from each of many single cells), which would represent an opportunity for directly relating structure to function.

Third, most current models do not account for the fact that mammalian cells are diploid, that is, they do not distinguish or separately model homologous chromosomes, which will be particularly important for modelling based on single-cell data. The haplotype-resolution of the genomes of the common cell types chosen by the consortium will aid in this goal.

Fourth, contemporary approaches for modelling genome architecture are typically static rather than dynamic, reflecting the static nature of available Hi-C data and the majority of imaging data. As we are increasingly able to visualize (by direct imaging) or infer (by single-cell or bulk Hi-C analyses of time series) chromatin dynamics, for example, during differentiation, and cell cycle progression, it will be essential that these observations can be integrated into computational models. Two classes of modelling approaches can tackle dynamics differently. Data-driven modelling can use Hi-C data obtained from different time points (for example, stages of differentiation or cell cycle) to build conformational ensembles for each point and then hypothesize about possible mechanisms that led to observed reorganizations. De novo modelling, in turn, can test whether a particular mechanism that needs to be stipulated first, could lead to observed temporal changes in Hi-C data. Polymer models can further show whether observed temporal reorganizations can reflect the slow equilibration process of generally non-equilibrium chromosomes. Moreover, mechanistic de novo models can be further validated by dynamic data from live-cell imaging experiments, for example, by examining mean-squared displacements of different chromosomal loci versus time in experiment and simulations56.

Fifth, new data and future models should help to connect genome architecture and other aspects of genome function, for example, by suggesting molecular mechanisms of how transcription-factor binding or epigenetic modifications can lead to formation of active/inactive chromatin compartments57. The inferred mechanisms could generate testable predictions of sequence–structure–function relationships, that is, how nuclear architecture relates to nuclear function. Availability of temporal Hi-C, functional data and models of chromatin organization and dynamics can enable the identification of causality if certain functional characteristics at earlier time points are predictive of later chromosomal states, or vice versa. Such associations and inferred causations can then be further tested experimentally.

Relating structure to function

An important and overarching goal is to determine how genome structure and chromatin conformation modulate genome function in health and disease. To this end, the 4DN Network will explore experimental approaches to manipulate and perturb different features of the 4D nucleome. First, using CRISPR–Cas9 technologies, DNA elements involved in specific chromatin structures, for example, domain boundaries or chromatin loops, can be altered, re-located or deleted58, 59. Second, defined chromatin structures, such as chromatin loops will be engineered de novo by targeting proteins that can (be induced to) dimerize with their partner looping proteins (for example, ref. 7). Third, other CRISPR–Cas9 approaches will be used to target enzymes (for example, histone-modifying enzymes, structural proteins) or ncRNAs to specific sites in the genome. Fourth, several groups will perturb nuclear compartmentalization by developing methods for ‘rewiring’ chromosome regions to different nuclear compartments, either by integrating specific DNA sequences that are capable of autonomous targeting of the locus to different nuclear compartments or by tethering certain proteins to these loci to accomplish similar re-positioning. Fifth, cell lines will be generated for conditional or temporal ablation of nuclear bodies or candidate chromosome architectural proteins (such as CTCF and cohesin) or RNAs. Sixth, additional methods will be developed to nucleate nuclear bodies at specific chromosomal loci. Finally, biophysical approaches will be developed to micro-mechanically perturb cell nuclei and chromosomes followed by direct imaging of specific loci60. Although it remains challenging to establish direct cause-and-effect relationships, analysis of the effects of any of these perturbations on processes, such as gene expression and DNA replication, can provide deeper mechanistic insights into the roles of chromosome structure and nuclear organization in regulating the genome.

Data sharing and standards

The Network will develop guidelines for data formats, metadata (descriptions of how the data were acquired), standards, quality control measures and other key data-related issues. Another goal is to make this data rapidly accessible, both within the Network and the entire scientific community. These efforts will be of particular importance for new technologies for which standards for sharing data and assessing data quality have not yet been established. Such standards will greatly enhance the usefulness of the datasets for the broader scientific community beyond those who generate the data.

For sequencing-based technologies, data format standards to represent sequences and alignments have long been present (for example, FASTQ, BAM/SAM). However, common formats to represent three-dimensional interactions are yet to be developed. These formats need to account for large data sizes and the constraints imposed by different computer architectures. For Hi-C data, for example, the genome-wide contact probability map is an N2 matrix, where N is the spatial resolution (for 10-kb resolution, N = 300,000), with most of the entries being empty. There are multiple ways to represent such sparse matrices, appropriate for different analysis and storage approaches. For imaging technologies, the situation is even more challenging, as the types of microscopes that will be used are highly variable, and the data formats and analysis tools are often dependent on the manufacturer. Standards to unify data and metadata from different manufacturers, such as the Open Microscopy Environment (https://www.openmicroscopy.org/site), are under development. These standards also need to accommodate the rapid developments in super-resolution microscopy.

A related issue is to define a set of appropriate metadata fields and minimum metadata requirements such that sufficient and useful details are available to other investigators outside the Network. While not all information can be captured about an experiment, collecting pertinent information will increase the reproducibility of experiments and the likelihood that the data will be re-used by other investigators. The 4DN Network has established formal working groups, including the 4DN Data Analysis Group, Omics Data Standards Group and Imaging Data Standards Group, which will define these 4DN standards and data analysis protocols.

Developing a set of measures for assessing data quality and determining appropriate thresholds will be important for ensuring high-quality 4DN data. An important measure of data quality and reliability of new technologies is the reproducibility of results between repeated experiments. Reproducibility can be assessed at multiple levels; for example, technical reproducibility measures how well a technique performs for the same starting material, whereas biological reproducibility should also capture all other variations, including heterogeneity among samples. The 4DN Data Analysis Group will compute and make available quality control measures and provide recommendations on expected quality standard thresholds so that investigators can make decisions regarding the utility of specific datasets for addressing their specific questions.

Finally, to ensure rapid dissemination of findings made by the 4DN Network with the larger scientific community, the Network has adopted a transparent and open publication policy, where all work supported by the Network is submitted to a public preprint server such as BioRxiv before submission to a peer-reviewed journal.

Outlook

After determining the complete DNA sequence of the human genome and subsequent mapping of most genes and potential regulatory elements, we are now in a position that can be considered the third phase of the human genome project. In this phase, which builds upon and extends other epigenome mapping efforts mentioned above, the spatial organization of the genome is elucidated and its functional implications revealed. This requires a wide array of technologies from the fields of imaging, genomics, genetic engineering, biophysics, computational biology and mathematical modelling. The 4DN Network, as presented here, provides a mechanism to address this uniquely interdisciplinary challenge. Furthermore, the policy of openness and transparency both within the Network and with the broader scientific community, and the public sharing of all methods, data and models will ensure rapid dissemination of new knowledge, further enhancing the potential impact of the work. This will also require fostering collaborations and establishing connections to other related efforts around the world, for example, the initiative to start a European 4DN project (https://www.4dnucleome.eu), that are currently under development. Together these integrated studies promise to allow moving from a one-dimensional representation of the genome as a long DNA sequence to a spatially and dynamically organized three-dimensional structure of the living and functional genome inside cells.

References

  1. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012)
  2. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317330 (2015)
  3. Stunnenberg, H. G., International Human Epigenome Consortium & Hirst, M. The international human epigenome consortium: a blueprint for scientific collaboration and discovery. Cell 167, 11451149 (2016)
  4. The FANTOM Consortium et al. The transcriptional landscape of the mammalian genome. Science 309, 15591563 (2005)
  5. Tolhuis, B., Palstra, R. J., Splinter, E., Grosveld, F. & de Laat, W. Looping and interaction between hypersensitive sites in the active β-globin locus. Mol. Cell 10, 14531465 (2002)
  6. Tang, Z. et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163, 16111627 (2015)
  7. Deng, W. et al. Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell 158, 849860 (2014)
  8. Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948951 (2008)
  9. Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402405 (2014)
  10. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 11901195 (2012)
  11. Roix, J. J., McQueen, P. G., Munson, P. J., Parada, L. A. & Misteli, T. Spatial proximity of translocation-prone gene loci in human lymphomas. Nat. Genet. 34, 287291 (2003)
  12. Zhang, Y. et al. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908921 (2012)
  13. Cremer, T. & Cremer, C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat. Rev. Genet. 2, 292301 (2001)
  14. Bickmore, W. A. The spatial organization of the human genome. Annu. Rev. Genomics Hum. Genet. 14, 6784 (2013)
  15. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 13061311 (2002)
  16. Denker, A. & de Laat, W. The second decade of 3C technologies: detailed insights into nuclear organization. Genes Dev. 30, 13571382 (2016)
  17. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289293 (2009)
  18. Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 16651680 (2014)
  19. Fudenberg, G. et al. Formation of chromosomal domains by loop extrusion. Cell Reports 15, 20382049 (2016)
  20. Sanborn, A. L. et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl Acad. Sci. USA 112, E6456E6465 (2015)
  21. de Wit, E. et al. CTCF binding polarity determines chromatin looping. Mol. Cell 60, 676684 (2015)
  22. Vietri Rudan, M. et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Reports 10, 12971309 (2015)
  23. Dekker, J. & Mirny, L. The 3D genome as moderator of chromosomal communication. Cell 164, 11101121 (2016)
  24. Engreitz, J. M., Ollikainen, N. & Guttman, M. Long non-coding RNAs: spatial amplifiers that control nuclear structure and gene expression. Nat. Rev. Mol. Cell Biol. 17, 756770 (2016)
  25. Hess, S. T., Girirajan, T. P. & Mason, M. D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 42584272 (2006)
  26. Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 16421645 (2006)
  27. Rust, M. J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793795 (2006)
  28. Cisse, I. I. et al. Real-time dynamics of RNA polymerase II clustering in live human cells. Science 341, 664667 (2013)
  29. Ma, H. et al. Multiplexed labeling of genomic loci with dCas9 and engineered sgRNAs using CRISPRainbow. Nat. Biotechnol. 34, 528530 (2016)
  30. Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 14791491 (2013)
  31. Deng, W., Shi, X., Tjian, R., Lionnet, T. & Singer, R. H. CASFISH: CRISPR/Cas9-mediated in situ labeling of genomic loci in fixed cells. Proc. Natl Acad. Sci. USA 112, 1187011875 (2015)
  32. Marti-Renom, M. A. & Mirny, L. A. Bridging the resolution gap in structural modeling of 3D genome organization. PLOS Comput. Biol. 7, e1002125 (2011)
  33. Imakaev, M. V., Fudenberg, G. & Mirny, L. A. Modeling chromosomes: beyond pretty pictures. FEBS Lett. 589, 30313036 (2015)
  34. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376380 (2012)
  35. Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381385 (2012)
  36. Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 5964 (2013)
  37. Beagrie, R. A. et al. Complex multi-enhancer contacts captured by genome architecture mapping. Nature 543, 519524 (2017)
  38. Ma, H. et al. Multicolor CRISPR labeling of chromosomal loci in human cells. Proc. Natl Acad. Sci. USA 112, 30023007 (2015)
  39. Martell, J. D. et al. Engineered ascorbate peroxidase as a genetically encoded reporter for electron microscopy. Nat. Biotechnol. 30, 11431148 (2012)
  40. Bulina, M. E. et al. A genetically encoded photosensitizer. Nat. Biotechnol. 24, 9599 (2006)
  41. Le Gros, M. A. et al. Soft X-ray tomography reveals gradual chromatin compaction and reorganization during neurogenesis in vivo. Cell Reports 17, 21252136 (2016)
  42. Smith, E. A. et al. Quantitatively imaging chromosomes by correlated cryo-fluorescence and soft X-ray tomographies. Biophys. J. 107, 19881996 (2014)
  43. Ou, H. D. et al. ChromEMT: Visualizing 3D chromatin structure and compaction in interphase and mitotic cells. Science 357, eaag0025 (2017)
  44. Dundr, M. Nuclear bodies: multifunctional companions of the genome. Curr. Opin. Cell Biol. 24, 415422 (2012)
  45. Mao, Y. S., Zhang, B. & Spector, D. L. Biogenesis and function of nuclear bodies. Trends Genet. 27, 295306 (2011)
  46. Németh, A. et al. Initial genomics of the human nucleolus. PLoS Genet. 6, e1000889 (2010)
  47. van Koningsbruggen, S. et al. High-resolution whole-genome sequencing reveals that specific chromatin domains from most human chromosomes associate with nucleoli. Mol. Biol. Cell 21, 37353748 (2010)
  48. Lee, S. Y. et al. APEX fingerprinting reveals the subcellular localization of proteins of interest. Cell Reports 15, 18371847 (2016)
  49. McHugh, C. A. et al. The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature 521, 232236 (2015)
  50. Engreitz, J. M. et al. The Xist lncRNA exploits three-dimensional genome architecture to spread across the X chromosome. Science 341, 1237973 (2013)
  51. Akhtar, W. et al. Chromatin position effects assayed by thousands of reporters integrated in parallel. Cell 154, 914927 (2013)
  52. Serra, F. et al. Restraint-based three-dimensional modeling of genomes and genomic domains. FEBS Lett. 589, 29872995 (2015)
  53. Zhu, Y. et al. Comprehensive characterization of neutrophil genome topology. Genes Dev. 31, 141153 (2017)
  54. Naumova, N. et al. Organization of the mitotic chromosome. Science 342, 948953 (2013)
  55. Fudenberg, G. & Imakaev, M. FISH-ing for captured contacts: towards reconciling FISH and 3C. Nat. Methods 14, 673678 (2017)
  56. Lucas, J. S., Zhang, Y., Dudko, O. K. & Murre, C. 3D trajectories adopted by coding and regulatory DNA elements: first-passage times for genomic interactions. Cell 158, 339352 (2014)
  57. Jost, D., Carrivain, P., Cavalli, G. & Vaillant, C. Modeling epigenome folding: formation and dynamics of topologically associated chromatin domains. Nucleic Acids Res. 42, 95539561 (2014)
  58. Xiong, X., Chen, M., Lim, W. A., Zhao, D. & Qi, L. S. CRISPR/Cas9 for human genome engineering and disease research. Annu. Rev. Genomics Hum. Genet. 17, 131154 (2016)
  59. Wang, H., La Russa, M. & Qi, L. S. CRISPR/Cas9 in genome editing and beyond. Annu. Rev. Biochem. 85, 227264 (2016)
  60. Poirier, M. G. & Marko, J. F. Micromechanical studies of mitotic chromosomes. Curr. Top. Dev. Biol. 55, 75141 (2003)
  61. Rodley, C. D., Bertels, F., Jones, B. & O’Sullivan, J. M. Global identification of yeast chromosome interactions using genome conformation capture. Fungal Genet. Biol. 46, 879886 (2009)
  62. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 13481354 (2006)
  63. Dostie, J. et al. Chromosome conformation capture carbon copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 12991309 (2006)
  64. Ramani, V. et al. Massively multiplex single-cell Hi-C. Nat. Methods 14, 263266 (2017)
  65. Ma, W. et al. Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat. Methods 12, 7178 (2015)
  66. Hsieh, T.-H. S. et al. Mapping nucleosome resolution chromosome folding in yeast by Micro-C. Cell 162, 108119 (2015)
  67. Hsieh, T.-H. S., Fudenberg, G., Goloborodko, A. & Rando, O. J. Micro-C XL: assaying chromosome conformation from the nucleosome to the entire genome. Nat. Methods 13, 10091011 (2016)
  68. Dryden, N. H. et al. Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. Genome Res. 24, 18541868 (2014)
  69. Hughes, J. R. et al. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat. Genet. 46, 205212 (2014)
  70. Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 9098 (2011)
  71. Darrow, E. M. et al. Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture. Proc. Natl Acad. Sci. USA 113, E4504E4512 (2016)
  72. Fullwood, M. J. et al. An oestrogen-receptor-α-bound human chromatin interactome. Nature 462, 5864 (2009)
  73. van Steensel, B. & Henikoff, S. Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nat. Biotechnol. 18, 424428 (2000)
  74. Hiratani, I. et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 6, e245 (2008)
  75. Hansen, R. S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl Acad. Sci. USA 107, 139144 (2010)
  76. Chu, C. et al. Systematic discovery of Xist RNA binding proteins. Cell 161, 404416 (2015)
  77. Simon, M. D. et al. The genomic binding sites of a noncoding RNA. Proc. Natl Acad. Sci. USA 108, 2049720502 (2011)
  78. Strongin, D. E., Groudine, M. & Politz, J. C. Nucleolar tethering mediates pairing between the IgH and Myc loci. Nucleus 5, 474481 (2014)
  79. Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods 11, 360361 (2014)
  80. Ma, H., Reyes-Gutierrez, P. & Pederson, T. Visualization of repetitive DNA sequences in human chromosomes with transcription activator-like effectors. Proc. Natl Acad. Sci. USA 110, 2104821053 (2013)
  81. Miyanari, Y., Ziegler-Birling, C. & Torres-Padilla, M. E. Live visualization of chromatin dynamics with fluorescent TALEs. Nat. Struct. Mol. Biol. 20, 13211324 (2013)
  82. Shachar, S., Voss, T. C., Pegoraro, G., Sciascia, N. & Misteli, T. Identification of gene positioning factors using high-throughput imaging mapping. Cell 162, 911923 (2015)
  83. Ma, H. et al. CRISPR–Cas9 nuclear dynamics and target recognition in living cells. J. Cell Biol. 214, 529537 (2016)
  84. Takei, Y., Shah, S., Harvey, S., Qi, L. S. & Cai, L. Multiplexed dynamic imaging of genomic loci by combined CRISPR imaging and DNA sequential FISH. Biophys. J. 112, 17731776 (2017)
  85. Hocine, S., Raymond, P., Zenklusen, D., Chao, J. A. & Singer, R. H. Single-molecule analysis of gene expression using two-color RNA labeling in live yeast. Nat. Methods 10, 119121 (2013)
  86. Fukaya, T., Lim, B. & Levine, M. Enhancer control of transcriptional bursting. Cell 166, 358368 (2016)
  87. Robinett, C. C. et al. In vivo localization of DNA sequences and visualization of large-scale chromatin organization using lac operator/repressor recognition. J. Cell Biol. 135, 16851700 (1996)
  88. Chen, H., Fujioka, M. & Gregor, T. Direct visualization of transcriptional activation by physical enhancer–promoter proximity. Preprint at http://www.biorxiv.org/content/early/2017/01/11/099523 (2017)
  89. Phan, S. et al. 3D reconstruction of biological structures: automated procedures for alignment and reconstruction of multiple tilt series in electron tomography. Adv. Struct. Chem. Imaging 2, 8 (2017)
  90. Soto, G. E. et al. Serial section electron tomography: a method for three-dimensional reconstruction of large structures. NeuroImage 1, 230243 (1994)
  91. Adams, S. R. et al. Multicolor electron microscopy for simultaneous visualization of multiple molecular species. Cell Chem. Biol. 23, 14171427 (2016)
  92. Szymborska, A. et al. Nuclear pore scaffold structure analyzed by super-resolution microscopy and particle averaging. Science 341, 655658 (2013)
  93. Izeddin, I. et al. PSF shaping using adaptive optics for three-dimensional single-molecule super-resolution imaging and tracking. Opt. Express 20, 49574967 (2012)
  94. Paszek, M. J. et al. Scanning angle interference microscopy reveals cell dynamics at the nanoscale. Nat. Methods 9, 825827 (2012)
  95. Chen, B. C. et al. Lattice light-sheet microscopy: imaging molecules to embryos at high spatiotemporal resolution. Science 346, 1257998 (2014)
  96. Legant, W. R. et al. High-density three-dimensional localization microscopy across large volumes. Nat. Methods 13, 359365 (2016)
  97. Chen, J. et al. Single-molecule dynamics of enhanceosome assembly in embryonic stem cells. Cell 156, 12741285 (2014)
  98. Izeddin, I. et al. Single-molecule tracking in live cells reveals distinct target-search strategies of transcription factors in the nucleus. eLife 3, e02230 (2014)
  99. Hansen, A. S., Pustova, I., Cattoglio, C., Tjian, R. & Darzacq, X. CTCF and cohesin regulate chromatin loop stability with distinct dynamics. eLife 6, e25776 (2017)
  100. Ghosh, R., Draper, W., Franklin, J. M., Shi, Q. & Liphardt, J. A fluorogenic nanobody array tag for prolonged single molecule imaging in live cells. Preprint at http://www.biorxiv.org/content/early/2017/07/03/159004 (2017)
  101. Grimm, J. B. et al. A general method to improve fluorophores for live-cell and single-molecule microscopy. Nat. Methods 12, 244250 (2015)

Download references

Author information

Affiliations

  1. Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Howard Hughes Medical Institute, Worcester, Massachusetts 01605, USA

    • Job Dekker
  2. Department of Cell and Developmental Biology, University of Illinois, Urbana-Champaign, Illinois 61801, USA

    • Andrew S. Belmont
  3. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, USA

    • Mitchell Guttman
  4. Department of Bioengineering, University of California San Diego, La Jolla, California 92093, USA

    • Victor O. Leshyk &
    • Sheng Zhong
  5. Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA

    • John T. Lis
  6. Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain and Behavior Institute, Columbia University, New York, New York 10027, USA

    • Stavros Lomvardas
  7. Institute for Medical Engineering and Science, and Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Leonid A. Mirny
  8. Molecular and Cell Biology Laboratory, Salk Institute for Biological Studies, La Jolla, California 92037, USA

    • Clodagh C. O’Shea
  9. Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Peter J. Park
  10. Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, Moores Cancer Center, University of California San Diego, La Jolla California 92093, USA

    • Bing Ren
  11. Basic Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, Washington 98109, USA

    • Joan C. Ritland Politz
  12. Department of Genome Sciences, University of Washington, Howard Hughes Medical Institute, Seattle, Washington 98109, USA

    • Jay Shendure

Consortia

  1. the 4D Nucleome Network

  2. A list of participants and their affiliations appears in the Supplementary Information.

Contributions

All authors contributed to writing the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Reviewer Information Nature thanks G. Almouzni, G. Cavalli and H. Stunnenberg for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author details

Additional data