Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Hold out the genome: a roadmap to solving the cis-regulatory code

Abstract

Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The ‘cis-regulatory code’ — how cells interpret DNA sequences to determine when, where and how much genes should be expressed — has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Solving cis-regulation requires learning hundreds of millions of parameters.
Fig. 2: Advantages of random DNA in learning cis-regulation.
Fig. 3: Current and future technologies to solve cis-regulation.

Similar content being viewed by others

References

  1. Kim, S. & Wysocka, J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol. Cell 83, 373–392 (2023).

    Article  CAS  PubMed  Google Scholar 

  2. Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).

    Article  CAS  PubMed  Google Scholar 

  3. Zeitlinger, J. Seven myths of how transcription factors read the cis-regulatory code. Curr. Opin. Syst. Biol. 23, 22–31 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Baralle, M. & Baralle, F. E. The splicing code. Biosystems 164, 39–48 (2018).

    Article  CAS  PubMed  Google Scholar 

  5. Morris, C., Cluet, D. & Ricci, E. P. Ribosome dynamics and mRNA turnover, a complex relationship under constant cellular scrutiny. Wiley Interdiscip. Rev. RNA 12, e1658 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Borbolis, F. & Syntichaki, P. Cytoplasmic mRNA turnover and ageing. Mech. Ageing Dev. 152, 32–42 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Nieuwkoop, T., Finger-Bou, M., van der Oost, J. & Claassens, N. J. The ongoing quest to crack the genetic code for protein production. Mol. Cell 80, 193–209 (2020).

    Article  CAS  PubMed  Google Scholar 

  8. Cramer, P. Organization and regulation of gene transcription. Nature 573, 45–54 (2019).

    Article  CAS  PubMed  ADS  Google Scholar 

  9. Furlong, E. E. M. & Levine, M. Developmental enhancers and chromosome topology. Science 361, 1341–1345 (2018).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  10. Michael, A. K. & Thomä, N. H. Reading the chromatinized genome. Cell 184, 3599–3611 (2021).

    Article  CAS  PubMed  Google Scholar 

  11. Roeder, R. G. 50+ years of eukaryotic transcription: an expanding universe of factors and mechanisms. Nat. Struct. Mol. Biol. 26, 783–791 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem. 89, 213–234 (2020).

    Article  CAS  PubMed  Google Scholar 

  13. Cohen, B. A. How should novelty be valued in science? eLife 6, e28699 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Vaishnav, E. D. et al. The evolution, evolvability and engineering of gene regulatory DNA. Nature 603, 455–463 (2022). This paper demonstrates that random DNA-trained cis-regulatory models are useful for understanding cis-regulatory evolution and correctly predicted functional cis-regulatory variation.

    Article  CAS  PubMed  ADS  Google Scholar 

  15. Wittkopp, P. J. & Kalay, G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2012).

    Article  CAS  Google Scholar 

  16. Wray, G. A. The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8, 206–216 (2007).

    Article  CAS  PubMed  Google Scholar 

  17. Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).

    Article  CAS  PubMed  ADS  Google Scholar 

  18. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012). This paper reports that most genome-wide association study variation appears to be regulatory, a finding that has since been replicated for most complex traits.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  19. Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020). In this paper, the authors use DNase I footprinting to show that most human enhancers appear to have a relatively simple logic with few strict spacing or positional requirements.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  20. Arnosti, D. N. & Kulkarni, M. M. Transcriptional enhancers: intelligent enhanceosomes or flexible billboards? J. Cell. Biochem. 94, 890–898 (2005).

    Article  CAS  PubMed  Google Scholar 

  21. de Boer, C. G. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 38, 56–65 (2020). This paper demonstrates that the cis-regulatory activity of random DNA can be used to model many of the parameters of cis-regulation.

    Article  PubMed  Google Scholar 

  22. Tycko, J. et al. High-throughput discovery and characterization of human transcriptional effectors. Cell 183, 2020–2035.e16 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Alerasool, N., Leng, H., Lin, Z.-Y., Gingras, A.-C. & Taipale, M. Identification and functional characterization of transcriptional activators in human cells. Mol. Cell 82, 677–695.e7 (2022).

    Article  CAS  PubMed  Google Scholar 

  24. Reiter, F., Wienerroither, S. & Stark, A. Combinatorial function of transcription factors and cofactors. Curr. Opin. Genet. Dev. 43, 73–81 (2017).

    Article  CAS  PubMed  Google Scholar 

  25. Wei, B. et al. A protein activity assay to measure global transcription factor activity reveals determinants of chromatin accessibility. Nat. Biotechnol. 36, 521–529 (2018).

    Article  CAS  PubMed  Google Scholar 

  26. Sahu, B. et al. Sequence determinants of human gene regulatory elements. Nat. Genet. 54, 283–294 (2022). In this paper, the authors show that random DNA has regulatory activity in human cells and that it can be used to learn cis-regulatory models.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Balsalobre, A. & Drouin, J. Pioneer factors as master regulators of the epigenome and cell fate. Nat. Rev. Mol. Cell Biol. 23, 449–464 (2022).

    Article  CAS  PubMed  Google Scholar 

  28. Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Grossman, S. R. et al. Positional specificity of different transcription factor classes within enhancers. Proc. Natl Acad. Sci. USA 115, E7222–E7230 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Chen, L., Glover, J. N., Hogan, P. G., Rao, A. & Harrison, S. C. Structure of the DNA-binding domains from NFAT, Fos and Jun bound specifically to DNA. Nature 392, 42–48 (1998).

    Article  CAS  PubMed  ADS  Google Scholar 

  31. Perkins, N. D. et al. A cooperative interaction between NF-κB and Sp1 is required for HIV-1 enhancer activation. EMBO J. 12, 3551–3558 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Martinez, G. J. & Rao, A. Immunology. Cooperative transcription factor complexes in control. Science 338, 891–892 (2012).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  33. Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015). In this paper, the authors systematically test pairs of transcription factors to see which could bind cooperatively to the DNA using high-throughput sequencing SELEX, revealing that many transcription factor pairs prefer to bind in one or a few of the possible relative arrangements.

    Article  CAS  PubMed  ADS  Google Scholar 

  34. Henikoff, S. & Shilatifard, A. Histone modification: cause or cog? Trends Genet. 27, 389–396 (2011).

    Article  CAS  PubMed  Google Scholar 

  35. Loaeza-Loaeza, J., Beltran, A. S. & Hernández-Sotelo, D. DNMTs and impact of CpG content, transcription factors, consensus motifs, lncRNAs, and histone marks on DNA methylation. Genes 11, 1336 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Blattler, A. & Farnham, P. J. Cross-talk between site-specific transcription factors and DNA methylation states. J. Biol. Chem. 288, 34287–34294 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Schübeler, D. Function and information content of DNA methylation. Nature 517, 321–326 (2015).

    Article  PubMed  ADS  Google Scholar 

  38. Kreibich, E., Kleinendorst, R., Barzaghi, G., Kaspar, S. & Krebs, A. R. Single-molecule footprinting identifies context-dependent regulation of enhancers by DNA methylation. Mol. Cell 83, 787–802.e9 (2023).

    Article  CAS  PubMed  Google Scholar 

  39. Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356, eaaj2239 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Vinson, C. & Chatterjee, R. CG methylation. Epigenomics 4, 655–663 (2012).

    Article  CAS  PubMed  Google Scholar 

  41. Leman, A. R. & Noguchi, E. The replication fork: understanding the eukaryotic replication machinery and the challenges to genome duplication. Genes 4, 1–32 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Flury, V. et al. Recycling of modified H2A-H2B provides short-term memory of chromatin states. Cell 186, 1050–1065.e19 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Laprell, F., Finkl, K. & Müller, J. Propagation of Polycomb-repressed chromatin requires sequence-specific recruitment to DNA. Science 356, 85–88 (2017).

    Article  CAS  PubMed  ADS  Google Scholar 

  44. Coleman, R. T. & Struhl, G. Causal role for inheritance of H3K27me3 in maintaining the OFF state of a Drosophila HOX gene. Science 356, eaai8236 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Hua, P. et al. Defining genome architecture at base-pair resolution. Nature 595, 125–129 (2021).

    Article  CAS  PubMed  ADS  Google Scholar 

  46. Lieberman-Aiden, E. et al. Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  47. Eagen, K. P. Principles of chromosome architecture revealed by Hi-C. Trends Biochem. Sci. 43, 469–478 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Van Bortle, K. & Corces, V. G. tDNA insulators and the emerging role of TFIIIC in genome organization. Transcription 3, 277–284 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Fulco, C. P. et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science 354, 769–773 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  50. Klann, T. S. et al. CRISPR–Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. de Boer, C. G., Ray, J. P., Hacohen, N. & Regev, A. MAUDE: inferring expression changes in sorting-based CRISPR screens. Genome Biol. 21, 134 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Rippe, K. Liquid-liquid phase separation in chromatin. Cold Spring Harb. Perspect. Biol. 14, a040683 (2022).

    Article  CAS  PubMed  Google Scholar 

  53. Hnisz, D., Shrinivas, K., Young, R. A., Chakraborty, A. K. & Sharp, P. A. A phase separation model for transcriptional control. Cell 169, 13–23 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Mirny, L. A. Nucleosome-mediated cooperativity between transcription factors. Proc. Natl Acad. Sci. USA 107, 22534–22539 (2010).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  55. Morgunova, E. & Taipale, J. Structural perspective of cooperative transcription factor binding. Curr. Opin. Struct. Biol. 47, 1–8 (2017).

    Article  CAS  PubMed  Google Scholar 

  56. Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021). In this paper, the authors make exceptional machine learning models that capture highly complex ChIP-nexus data for pluripotency transcription factors, revealing certainsofttranscription factor interactions.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Cheng, Q. et al. Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet. 9, e1003571 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Jindal, G. & Farley, E. Enhancer grammar in development, evolution, and disease — dependencies and interplay. Dev. Cell 56, 575–587 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. de Almeida, B. P., Reiter, F., Pagani, M. & Stark, A. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022).

    Article  PubMed  Google Scholar 

  60. Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

    Article  CAS  PubMed  Google Scholar 

  61. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). This paper describes a deep learning transformer-based sequence-to-expression predictor for the human genome.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020).

    Article  CAS  PubMed  Google Scholar 

  63. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Chen, K. M., Wong, A. K., Troyanskaya, O. G. & Zhou, J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940–949 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Horton, C. A. et al. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Preprint at bioRxiv https://doi.org/10.1101/2022.05.24.493321 (2022).

  66. Janssens, J. et al. Decoding gene regulation in the fly brain. Nature 601, 630–636 (2022). This work describes a deep learning model that can predict tissue specificity of enhancers in the Drosophila brain based on single-cell ATAC-seq data.

    Article  CAS  PubMed  ADS  Google Scholar 

  67. He, Q., Johnston, J. & Zeitlinger, J. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat. Biotechnol. 33, 395–401 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 56 (2023). This paper performs a rigorous evaluation of state-of-the-art cis-regulatory deep learning models trained on genomics data, noting substantial limitations.

    Article  PubMed  PubMed Central  Google Scholar 

  70. Sasse, A. et al. How far are we from personalized gene expression prediction using sequence-to-expression deep neural networks? Preprint at bioRxiv https://doi.org/10.1101/2023.03.16.532969 (2023).

  71. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  PubMed  ADS  Google Scholar 

  72. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  ADS  Google Scholar 

  73. Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  74. Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001.e19 (2021). This article provides an atlas of human single-cell ATAC-seq data, demonstrating the amount of specific open chromatin regions in individual human cell types.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Whalen, S., Schreiber, J., Noble, W. S. & Pollard, K. S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 23, 169–181 (2022).

    Article  CAS  PubMed  Google Scholar 

  76. de Koning, A. P. J., Gu, W., Castoe, T. A., Batzer, M. A. & Pollock, D. D. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7, e1002384 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Lee, J. M. & Sonnhammer, E. L. L. Genomic gene clustering analysis of pathways in eukaryotes. Genome Res. 13, 875–882 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Hurst, L. D., Pál, C. & Lercher, M. J. The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 5, 299–310 (2004).

    Article  CAS  PubMed  Google Scholar 

  79. Lercher, M. J., Urrutia, A. O. & Hurst, L. D. Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat. Genet. 31, 180–183 (2002).

    Article  CAS  PubMed  Google Scholar 

  80. Cannavò, E. et al. Shadow enhancers are pervasive features of developmental regulatory networks. Curr. Biol. 26, 38–51 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Barolo, S. Shadow enhancers: frequently asked questions about distributed cis-regulatory information and enhancer redundancy. BioEssays 34, 135–141 (2012).

    Article  CAS  PubMed  Google Scholar 

  82. Li, S. & Ovcharenko, I. Enhancer jungles establish robust tissue-specific regulatory control in the human genome. Genomics 112, 2261–2270 (2020).

    Article  CAS  PubMed  Google Scholar 

  83. Hong, J.-W., Hendrix, D. A. & Levine, M. S. Shadow enhancers as a source of evolutionary novelty. Science 321, 1314 (2008).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  84. Gotea, V. et al. Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. Genome Res. 20, 565–577 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Luna-Zurita, L. et al. Complex interdependence regulates heterotypic transcription factor distribution and coordinates cardiogenesis. Cell 164, 999–1014 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).

    Article  Google Scholar 

  87. Whalen, S. & Pollard, K. S. Reply to ‘Inflated performance measures in enhancer–promoter interaction-prediction methods’. Nat. Genet. 51, 1198–1200 (2019).

    Article  CAS  PubMed  Google Scholar 

  88. Cao, F. & Fullwood, M. J. Inflated performance measures in enhancer–promoter interaction-prediction methods. Nat. Genet. 51, 1196–1198 (2019).

    Article  CAS  PubMed  Google Scholar 

  89. Xi, W. & Beer, M. A. Local epigenomic state cannot discriminate interacting and non-interacting enhancer–promoter pairs with high accuracy. PLoS Comput. Biol. 14, e1006625 (2018).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  90. Barnett, E., Onete, D., Salekin, A. & Faraone, S. V. Genomic machine learning meta-regression: insights on associations of study features with reported model performance. Preprint at medRxiv https://doi.org/10.1101/2022.01.10.22268751 (2022).

  91. Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  92. Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18613–18618 (2007).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  93. Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  96. Kinney, J. B. & McCandlish, D. M. Massively parallel assays and quantitative sequence-function relationships. Annu. Rev. Genomics Hum. Genet. 20, 99–127 (2019).

    Article  CAS  PubMed  Google Scholar 

  97. Lubliner, S. et al. Core promoter sequence in yeast is a major determinant of expression level. Genome Res. 25, 1008–1017 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Gertz, J., Siggia, E. D. & Cohen, B. A. Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457, 215–218 (2009).

    Article  CAS  PubMed  ADS  Google Scholar 

  99. King, D. M. et al. Synthetic and genomic regulatory elements reveal aspects of cis-regulatory grammar in mouse embryonic stem cells. eLife 9, e41279 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Yuh, C. H. & Davidson, E. H. Modular cis-regulatory organization of Endo16, a gut-specific gene of the sea urchin embryo. Dev. Camb. Engl. 122, 1069–1082 (1996).

    CAS  Google Scholar 

  101. Hossain, A. et al. Automated design of thousands of nonrepetitive parts for engineering stable genetic systems. Nat. Biotechnol. 38, 1466–1475 (2020).

    Article  CAS  PubMed  Google Scholar 

  102. Wilson, D. S. & Szostak, J. W. In vitro selection of functional nucleic acids. Annu. Rev. Biochem. 68, 611–647 (1999).

    Article  CAS  PubMed  Google Scholar 

  103. Keefe, A. D. & Szostak, J. W. Functional proteins from a random-sequence library. Nature 410, 715–718 (2001).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  104. Cuperus, J. T. et al. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res. 27, 2015–2024 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Sample, P. J. et al. Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat. Biotechnol. 37, 803–809 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Rosenberg, A. B., Patwardhan, R. P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).

    Article  CAS  PubMed  Google Scholar 

  107. Liao, S. E., Sudarshan, M. & Regev, O. Machine learning for discovery: deciphering RNA splicing logic. Preprint at bioRxiv https://doi.org/10.1101/2022.10.01.510472 (2022).

  108. Bogard, N., Linder, J., Rosenberg, A. B. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell 178, 91–106.e23 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Galupa, R. et al. Enhancer architecture and chromatin accessibility constrain phenotypic space during Drosophila development. Dev. Cell 58, 51–62.e4 (2023). This study demonstrates that random DNA sequences tested in a reporter system show diverse cell-type-specific expression across early Drosophila development.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Wunderlich, Z. & Mirny, L. A. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009). This paper demonstrates that eukaryotic transcription factors lack sufficient specificity to uniquely specify genes for activation and so must work combinatorially.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).

    Article  CAS  PubMed  Google Scholar 

  112. Ogawa, N. & Biggin, M. D. High-throughput SELEX determination of DNA sequences bound by transcription factors in vitro. Methods Mol. Biol. Clifton NJ 786, 51–63 (2012).

    Article  CAS  Google Scholar 

  113. Luthra, I. et al. Biochemical activity is the default DNA state in eukaryotes. Preprint at bioRxiv https://doi.org/10.1101/2022.12.16.520785 (2022).

  114. Ni, X. et al. Adaptive evolution and the birth of CTCF binding sites in the Drosophila genome. PLoS Biol. 10, e1001420 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Weirauch, M. T. & Hughes, T. R. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet. 26, 66–74 (2010).

    Article  CAS  PubMed  Google Scholar 

  116. Wong, E. S. et al. Deep conservation of the enhancer regulatory code in animals. Science 370, eaax8137 (2020).

    Article  CAS  PubMed  ADS  Google Scholar 

  117. Villar, D. et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Cotney, J. et al. The evolution of lineage-specific regulatory activities in the human embryonic limb. Cell 154, 185–196 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Arnold, C. D. et al. Quantitative genome-wide enhancer activity maps for five Drosophila species show functional enhancer conservation and turnover during cis-regulatory evolution. Nat. Genet. 46, 685–692 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Eichenlaub, M. P. & Ettwiller, L. De novo genesis of enhancers in vertebrates. PLoS Biol. 9, e1001188 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Gvozdenov, Z., Barcutean, Z. & Struhl, K. Functional analysis of a random-sequence chromosome reveals a high level and the molecular nature of transcriptional noise in yeast cells.Mol. Cell 83, 1786–1797 (2023).

  122. Maniatis, T. et al. Structure and function of the interferon-β enhanceosome. Cold Spring Harb. Symp. Quant. Biol. 63, 609–620 (1998).

    Article  CAS  PubMed  Google Scholar 

  123. Panne, D., Maniatis, T. & Harrison, S. C. An atomic model of the interferon-β enhanceosome. Cell 129, 1111–1123 (2007). This structural study describes binding of transcription factors in a highly optimized and compact human enhancer.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Emera, D., Yin, J., Reilly, S. K., Gockley, J. & Noonan, J. P. Origin and evolution of developmental enhancers in the mammalian neocortex. Proc. Natl Acad. Sci. USA 113, E2617–E2626 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  125. Fong, S. L. & Capra, J. A. Modeling the evolutionary architectures of transcribed human enhancer sequences reveals distinct origins, functions, and associations with human trait variation. Mol. Biol. Evol. 38, 3681–3696 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Friedman, R. Z. et al. Active learning of enhancer and silencer regulatory grammar in photoreceptors. Preprint at bioRxiv https://doi.org/10.1101/2023.08.21.554146 (2023).

  127. Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).

    Article  CAS  PubMed  ADS  Google Scholar 

  128. Neumayr, C., Pagani, M., Stark, A. & Arnold, C. D. STARR-seq and UMI-STARR-seq: assessing enhancer activities for genome-wide-, high-, and low-complexity candidate libraries. Curr. Protoc. Mol. Biol. 128, e105 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  129. Muerdter, F. et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat. Methods 15, 141–149 (2018).

    Article  CAS  PubMed  Google Scholar 

  130. Kerkmann, M. et al. Activation with CpG-A and CpG-B oligonucleotides reveals two distinct regulatory pathways of type I IFN synthesis in human plasmacytoid dendritic cells. J. Immunol. 170, 4465–4474 (2003).

    Article  CAS  PubMed  Google Scholar 

  131. Harton, M. D., Koh, W. S., Bunker, A. D., Singh, A. & Batchelor, E. p53 pulse modulation differentially regulates target gene promoters to regulate cell fate decisions. Mol. Syst. Biol. 15, e8685 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Adamson, A. et al. Signal transduction controls heterogeneous NF-κB dynamics and target gene expression through cytokine-specific refractory states. Nat. Commun. 7, 12057 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  133. Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021).

    Article  CAS  PubMed  Google Scholar 

  134. Lalanne, J.-B. et al. Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters. Preprint at bioRxiv https://doi.org/10.1101/2022.12.10.519236 (2022).

  135. Zhao, S. et al. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat. Genet. 55, 346–354 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Murtha, M. et al. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nat. Methods 11, 559–565 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Levo, M. et al. Systematic investigation of transcription factor activity in the context of chromatin using massively parallel binding and expression assays. Mol. Cell 65, 604–617.e6 (2017).

    Article  CAS  PubMed  Google Scholar 

  138. Joung, J. et al. A transcription factor atlas of directed differentiation. Cell 186, 209–229.e26 (2023).

    Article  CAS  PubMed  Google Scholar 

  139. Calderon, D. et al. TransMPRA: a framework for assaying the role of many trans-acting factors at many enhancers. Preprint at bioRxiv https://doi.org/10.1101/2020.09.30.321323 (2020).

  140. Ng, A. H. M. et al. A comprehensive library of human transcription factors for cell fate engineering. Nat. Biotechnol. 39, 510–519 (2021).

    Article  CAS  PubMed  Google Scholar 

  141. Sidore, A. M., Plesa, C., Samson, J. A., Lubock, N. B. & Kosuri, S. DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions. Nucleic Acids Res. 48, e95 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 359, 343–347 (2018).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  143. Camellato, B. R., Brosh, R., Maurano, M. T. & Boeke, J. D. Genomic analysis of a synthetic reversed sequence reveals default chromatin states in yeast and mammalian cells. Preprint at bioRxiv https://doi.org/10.1101/2022.06.22.496726 (2022).

  144. Pinglay, S. et al. Synthetic regulatory reconstitution reveals principles of mammalian Hox cluster regulation. Science 377, eabk2820 (2022). The authors of this study synthesized synthetic variants of the HOXA cluster, up to approximately 170 kb of synthetic DNA to dissect the regulatory logic of the locus.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. Zhao, Y. et al. Debugging and consolidating multiple synthetic chromosomes reveals combinatorial genetic interactions. Cell 186, 5220–5236 (2023).

  146. Venter, J. C., Glass, J. I., Hutchison, C. A. & Vashee, S. Synthetic chromosomes, genomes, viruses, and cells. Cell 185, 2708–2724 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Boeke, J. D. et al. The Genome Project-Write. Science 353, 126–127 (2016).

    Article  CAS  PubMed  ADS  Google Scholar 

  148. Battaglia, S. et al. Long-range phasing of dynamic, tissue-specific and allele-specific regulatory elements. Nat. Genet. 54, 1504–1513 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Krebs, A. R. Studying transcription factor function in the genome at molecular resolution. Trends Genet. 37, 798–806 (2021).

    Article  CAS  PubMed  Google Scholar 

  150. Stergachis, A. B., Debo, B. M., Haugen, E., Churchman, L. S. & Stamatoyannopoulos, J. A. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science 368, 1449–1454 (2020). This paper reports genome-scale single-molecule measurements of transcription factor and nucleosome binding across long (approximately 10 kb) chromatin fragments.

    Article  CAS  PubMed  ADS  Google Scholar 

  151. Koonin, E. V. Splendor and misery of adaptation, or the importance of neutral null for understanding evolution. BMC Biol. 14, 114 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  152. Eddy, S. R. The ENCODE project: missteps overshadowing a success. Curr. Biol. 23, R259–R261 (2013).

    Article  CAS  PubMed  Google Scholar 

  153. Kim, J., Koo, B.-K. & Knoblich, J. A. Human organoids: model systems for human biology and medicine. Nat. Rev. Mol. Cell Biol. 21, 571–584 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Vierbuchen, T. & Wernig, M. Molecular roadblocks for cellular reprogramming. Mol. Cell 47, 827–838 (2012).

    Article  CAS  PubMed  Google Scholar 

  155. Tu, L., Lalwani, G., Gella, S. & He, H. An empirical study on robustness to spurious correlations using pre-trained language models. Trans. Assoc. Comput. Linguist. 8, 621–633 (2020).

    Article  Google Scholar 

  156. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).

  157. Baek, M., McHugh, R., Anishchenko, I., Baker, D. & DiMaio, F. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods https://doi.org/10.1038/s41592-023-02086-5 (2023).

  158. Koo, P. K. & Ploenzke, M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nat. Mach. Intell. 3, 258–266 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  159. Prakash, E. I., Shrikumar, A. & Kundaje, A. Towards more realistic simulated datasets for benchmarking deep learning models in regulatory genomics. In Proc. 16th Machine Learning in Computational Biology 58–77 (PMLR, 2022).

  160. Rafi, A. M. et al. Evaluation and optimization of sequence-based gene regulatory deep learning models. Preprint at bioRxiv https://doi.org/10.1101/2023.04.26.538471 (2023).

  161. Weirauch, M. T. et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126–134 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. Meyer, P. et al. Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach. Genome Res. 23, 1928–1937 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Segal, E. et al. A genomic code for nucleosome positioning. Nature 442, 772–778 (2006).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  164. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  165. Pak, M. A. et al. Using AlphaFold to predict the impact of single mutations on protein stability and function. PLoS ONE 18, e0282689 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  166. Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29, 1–2 (2022).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank B. Cleary, B. Cohen, G. Eraslan, A. Kundaje, B. Lehner, S. Mostafavi, A. M. Rafi, S. Reilly, C. Rogerson, A. Sasse, J. Schreiber, N. Shakiba, J. Shendure, B. van Steensel, M. Taipale, O. Tariq, X. Tu, M. Underhill, M. Weirauch and N. Yachie for helpful discussions. We apologise to all our colleagues whose work we could not cite due to the limit on the number of references. C.G.d.B. is a Michael Smith Health Research BC Scholar and is supported by a Stem Cell Network Jump Start award (ECR-C4R1-7).

Author information

Authors and Affiliations

Authors

Contributions

C.G.d.B. and J.T. conceptualized the paper. C.G.d.B. produced the first draft, analysed the data and created the figures with advice from J.T. C.G.d.B. and J.T. edited the manuscript.

Corresponding authors

Correspondence to Carl G. de Boer or Jussi Taipale.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Shaun Mahony and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Boer, C.G., Taipale, J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 625, 41–50 (2024). https://doi.org/10.1038/s41586-023-06661-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-023-06661-w

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing