Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Drug hunters uncloak the non-coding ‘hidden’ genome

A Publisher Correction to this article was published on 11 November 2021

This article has been updated

Scientists are using new tools to mine the non-coding part of the genome, known as ‘dark matter’, to uncover disease-linked changes in promoters, silencers and enhancers for target discovery.

University of Oxford spinout Nucleome Therapeutics made a splash this past June with a publication describing Micro-Capture-C—a technique that defines physical contacts between gene regulatory elements at base-pair resolution. This method makes it possible to scour the chromatin architecture with unprecedented precision, allowing researchers to identify a burgeoning number of interactions between enhancers and promoters. Such in-depth knowledge of the regulatory elements and the ways that they influence transcription could provide insights into how genes are dysregulated in various diseases—and give drug hunters new opportunities to restore this activity to healthy levels.

Visualizing the genome in three dimensions can uncover disease-linked genetic variants and how they are regulated in health and disease. Image adapted with permission from T. J. Stevens et al. Nature 544, 59–64 (2017), Springer Nature.

Several other startups have made similar plays to develop drugs for targets hidden within the chromatin, and some have drawn considerable investor attention (Table 1). For example, Cambridge, Massachusetts-based Omega Therapeutics pulled in $126 million in funding this past March, and CAMP4 Therapeutics—also in Cambridge—raised $45 million in June. Nucleome CEO Danuta Jeziorska says the space is “still quite nascent at the moment,” but is attracting increasing interest.

Table 1 A selected list of companies developing drugs based on gene-regulatory insights from genomic non-coding regions

Protein-coding sequences compose only 1–2% of the human genome. The remaining terrain conceals myriad enhancers, silencers, promoters and other sequences that coordinate the expression of those genes. Such sequences can be scattered over a considerable distance from the gene they control, and researchers can use ‘chromosome conformation capture’ (3C)-style experimental techniques to get a sense of which genomic regions are interacting to modulate gene activity. But until the advent of Micro-Capture-C, these 3C techniques offered insufficient resolution to precisely home in on the actual sequences involved. This is important because mutations within these poorly characterized but essential DNA elements can potentially have as profound a biological impact as changes to a protein-coding sequence.

James Davies, a University of Oxford researcher and co-founder of Nucleome, notes that such non-coding changes comprise upward of 90% of the sequence variants linked to disease risk in genome-wide association studies (GWAS)—large-scale population studies that compare affected and control cohorts to identify DNA sequences with a robust statistical association with a particular condition. By diving into these non-coding variants, researchers could therefore gain essential insights into how individual genes are expressed in both health and disease. “The big goal is to try and understand the grammar that’s controlling transcription,” says Davies.

Many drug discovery efforts have made sizeable bets on genomics and sequencing as a part of the toolkit for identifying potential targets. “We’ve sequenced nearly 2 million people around the world through a hundred different projects,” says Aris Baras, who heads the Regeneron Genetics Center in Tarrytown, New York. But for the most part, the focus of such efforts in the pharmaceutical industry is squarely on the subset of sequence that encodes proteins, known as the exome.

But a few of these companies are making forays into non-coding sequences. Slavé Petrovski, head of the Centre for Genomics Research at AstraZeneca, highlights his company’s collaboration with the UK Biobank, which will ultimately provide access to half a million whole-genome sequences coupled to relevant medical data. “This will allow us to start studying the clinical relevance of variants outside of the exome,” he says. His team has also developed a software tool, called JARVIS, that uses deep learning to identify non-coding variants with likely clinical significance.

Ultimately, whether or not a gene is transcribed is coordinated through the complex three-dimensional interactions of multiple distinct DNA sequences, which are in turn facilitated by proteins binding selectively to those sequences. These proteins can also introduce epigenetic modifications to the chromatin, which also play a pivotal role in determining which genes are active.

Identifying and studying these regulatory elements can be very challenging. “Some genes are controlled by enhancers really close to the promoter, sometimes they’re a million base pairs away,” says Davies. Furthermore, these interactions can be highly cell type and tissue specific and dependent on a host of other physiological conditions, making it difficult to identify the players and their parts.

Academic researchers are currently leading the charge. The New York University-based Dark Matter Project, for example, is a collaboration in which labs around the world are working to dissect the structure and function of the regulatory machinery associated with individual genes of scientific or biomedical interest. “We’re doing maybe a half-dozen loci a year,” says project co-lead Matthew Maurano. Progress in academic labs has given rise to dark-matter-focused startups including Nucleome, Omega and CAMP4. “We have been zeroing in with greater and greater resolution on both the structure and the functional implications of three-dimensional chromatin structure on gene regulation,” says Omega CSO Thomas McCauley.

High resolution is a critical feature of Nucleome’s core technology. Like other 3C methods, Micro-Capture-C entails chemical cross-linking of the chromosomes in order to lock them into their three-dimensional conformations. Enzymes are then used to cleave the DNA into small fragments, which are isolated and sequenced to identify which elements physically interact. But whereas existing 3C methods generally involve digestion using restriction enzymes, which cut at only a limited number of sequence-defined sites throughout the genome, Micro-Capture-C employs a combination of 3C improvements to achieve base-pair resolution. One of these is a randomly cutting enzyme called micrococcal nuclease, which generates a wider range of fragments per interaction site. When combined with other technical improvements to the 3C assay format, this makes it possible to zoom in on specific sequences involved in chromosomal interactions, rather than broad windows spanning several hundred bases within which an interaction is occurring.

Nucleome uses a machine-learning-based computational approach to identify disease-related non-coding variants, followed by wet-lab analysis to determine how these changes alter three-dimensional chromosomal organization. “From this analysis, you can go from the regulatory region of the variant to the gene that it is affecting,” says Jeziorska. The company will initially focus on studying perturbations in gene regulation in lymphocytes as a means to identify new therapeutic targets for autoimmune diseases. But rather than attempting to modulate gene regulation directly by, for example, manipulating epigenetic alterations or directly repairing regulatory mutations with tools like CRISPR genome editing, Nucleome aims to uncover disease-related pathways that can be modulated with conventional small molecules or biologics.

Other companies are targeting the gene-regulatory machinery in a more direct fashion. CAMP4’s therapeutic programs target various classes of non-coding RNAs that are specifically expressed at gene enhancers and promoters. “It’s become clear recently that these RNA molecules themselves are regulators of transcription of genes in their vicinity,” says CSO David Bumcrot. These ‘regulatory RNAs’ act via a variety of mechanisms; some bolster expression by helping to recruit transcription factors, while others exert an inhibitory effect by drawing those factors away from genomic regulatory elements.

The effects of these RNAs are subtle, but potentially meaningful. “They’re sort of a rheostat—tuning a gene up and down two-, three or fourfold, rather than logarithmically,” says Bumcrot. “There’s a huge range of diseases where that degree of upregulation fits in perfectly.” CAMP4 is developing antisense oligonucleotides that bind to, and modulate the effects of, regulatory RNAs to boost the activity of genes that are underexpressed in genetic disorders. They identify the relevant regulatory RNAs for a given gene through an extensive process of multi-omic analysis, which produces a map of the regulatory elements and their three-dimensional organization, as well as insights into the gene’s chromatin state and collective RNA output—both coding and non-coding.

The company’s lead program targets Dravet syndrome, an epileptic disorder in which patients have only one functional copy of an essential gene encoding a neuronal sodium channel, and Bumcrot anticipates beginning clinical trials in late 2022.

Omega’s approach goes even further into the weeds of regulatory control, employing engineered proteins that chemically modify chromatin to increase or suppress gene activity. This strategy builds on research by founding scientific advisors Rudolf Jaenisch and Rick Young, who identified the existence of so-called ‘insulated genomic domains’ (IGDs). These are naturally occurring three-dimensional loops of chromosomal DNA that create defined neighborhoods of genes, which are synergistically regulated by enhancers and promoters within the loop. “They tend to be co-regulated and implicated in the same sorts of disease pathways,” explains McCauley.

Once an IGD and the genes it houses have been associated with a disorder, Omega researchers generate what they call ‘Omega Epigenomic Controllers (OECs)’. These synthetic proteins feature a DNA-binding domain designed to recognize a specific 21-base sequence within the IGD in the genome, coupled to an effector domain coding for an enzyme that chemically modifies the local chromatin to increase or decrease gene expression as desired. To deliver the treatment, the researchers encapsulate mRNA transcripts within lipid nanoparticles; the mRNA is translated into the OECs after being taken up by the targeted cells. The therapeutic itself is in the body only briefly, but the resulting effects can be quite durable. “We can engineer a duration of effect anywhere from a couple of days to a couple of weeks to a couple of months,” says McCauley. Omega plans to file an IND for its most advanced clinical program, targeting the c-Myc oncogene in liver cancer, in early 2022.

For all these strategies, identifying a promising ‘hit’ in areas of the genome outside protein-coding regions is only the beginning. In contrast to coding-sequence mutations, whose effects can often be anticipated, mutations in regulatory elements require extensive experimental characterization. “I don’t think that we’re necessarily going to have a website where you drop in your GWAS results and it tells you ‘go drug this gene’,” says Maurano. And since the field still in its infancy, drugging the ‘regulome’ is likely to remain a labor-intensive affair for the foreseeable future.

Until these efforts bear clinical fruit, many big pharma companies will likely remain intrigued spectators. “I’m not sure in the near term if it’s going to be hugely game-changing,” says Morten Sogaard, head of target sciences at Pfizer. “It’s a very interesting space, but I would see it as probably more suited for a startup.” On the other hand, Regeneron’s Baras notes that there have already been a few examples of researchers achieving clinical impact by targeting non-coding elements. For example, CRISPR Therapeutics’ gene therapy approach for sickle-cell disease, which works by disrupting expression of a regulatory-element-binding protein that inhibits production of γ-globin, has shown promising results in clinical trials. “I think the science is fantastic,” he says. “You just have to find those kinds of targets and biological stories where these technologies can really be the heroes.”

Change history

Author information



Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Eisenstein, M. Drug hunters uncloak the non-coding ‘hidden’ genome. Nat Biotechnol 39, 1169–1171 (2021).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing