Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • ADVERTISEMENT FEATURE Advertiser retains sole responsibility for the content of this article

Beyond GWAS: Tackling the variant-to-function challenge

Scientists have found many variations in DNA that seem to be linked to disease. The next challenge is to determine how they actually affect human health.Credit: bestbrk/Shutterstock

“Human genetics is at a pivotal time,” says Cecilia Lindgren, a human geneticist at the University of Oxford, UK. Although researchers have become adept at discovering disease-associated genetic variants, how these variants affect disease processes, often referred to as variant to function (V2F), is far from clear. “We need to come together as a community and figure out how to overcome the barriers to get from genetic maps to mechanisms of disease to medicines,” she adds.

Lindgren is also chair of the newly launched International Common Disease Alliance (ICDA,, which is addressing V2F head-on. ICDA’s aim is first to improve understanding of the impact of genetic variants on health and then to accelerate development of new medicines for common diseases such as type 2 diabetes, autism, multiple sclerosis, schizophrenia and inflammatory bowel disease.

“Genome-wide association studies (GWAS) have identified more than 100,000 examples where natural genetic variants influence risk for common diseases and traits,” says Jesse Engreitz, a geneticist at the Broad Institute in Cambridge, Massachusetts, and a member of ICDA. “Each of these variants could provide key insights to guide therapeutic discovery, but only if we can connect them to their functions in particular genes, cell types and pathways that lead to disease.”

Determining the relationship and significance of genetic variants to disease has been a slow and involved process. The contribution of most variants to disease risk is very small, which can make their effects hard to detect. In many cases the variants are in the non-coding part of the genome where they have unpredictable effects on the expression of multiple genes. Furthermore, any effects they have may be cell-type specific and strongly influenced by genetic background and environmental factors.

Nevertheless, there have been some remarkable advances. Ten years ago, Vijay Sankaran, a physician-scientist who specializes in blood disorders at Boston Children’s Hospital, used GWAS to identify the BCL11A gene as a regulator of human fetal haemoglobin, which is known to ameliorate sickle cell disease1. This finding inspired clinical trials to explore whether reducing BCL11A expression in patients’ blood stem cells would increase fetal haemoglobin production. “Although it has taken 10 years to get here,” says Sankaran, “we now have the tools to examine a greater number of variants than ever before, and really start to understand what is going on in a range of diseases.”

The falling price of sequencing and the advances in sequencing technologies mean that more researchers are addressing V2F. As well as clinical phenotypes and genetic sequencing data, researchers are also able to analyse gene expression (transcriptome), protein production (proteome) and cellular function (metabolome). Together, this multi-omic data will help elucidate the potential causes of common diseases and identify personalized treatment targets (see figure).

Identifying functionally significant variants

As a first step to understanding the V2F relationship, researchers are developing tools to prioritize which variants to study. “At the moment, there is no clear recipe to determining which gene variants have the greatest effect on the pathogenesis of common disease,” says Lindgren. Variants that alter protein function are a good place to start. Anne O'Donnell-Luria, a clinical geneticist at Boston Children’s Hospital and associate director of the Center for Mendelian Genomics at the Broad Institute, is carrying out whole exome sequencing in thousands of families to identify novel disease-causing genes. “We are looking for genetic variants that could be having a deleterious effect on a protein that is not yet known to be implicated in the disease,” she explains.

By using massive databases such as ClinVar, an evidence-supported public archive of relationships between human genetic variations and phenotypes, O'Donnell-Luria can identify ‘likely pathogenic’ variants that are shared among individuals with a similar phenotype. From the genetic sequence alone it is possible to predict whether the variant will lead to an amino-acid change in the protein or to a stop codon that terminates translation early and results in a truncated protein.

To determine if a variant is pathogenic, O’Donnell-Luria needs to use well-validated, high-throughput, functional assays that model the effect of gene ‘spelling’ on its protein, and these are not yet widely available. “My hope is that in 5 to 10 years we can order functional assays for some of these variants of uncertain significance, just like we are now able to order exome sequencing from a diagnostic lab,” she says.

A range of V2F approaches

Engreitz highlights some recent developments that are helping V2F research. “Gene-editing tools can introduce single nucleotide changes in precise locations in the genome, opening a range of new opportunities for systematically editing in thousands of variants to characterize their effects in cells,” he says. Up to two-thirds of the genetic variations implicated in human disease are point mutations, caused when one base pair is switched for another. Reversing that switch could reveal gene functions and new mechanisms of disease, and lead to treatments or even cures for genetic disease.

Not all variants are in genes, however. Genome editing tools can also be used to disrupt enhancers, that can modulate gene expression. Engreitz’s team have combined CRISPR technology and gene expression data to examine more than 3,000 enhancer-gene pairs. The team was able to derive a surprisingly simple equation to predict how a particular enhancer will affect gene expression, based on the enhancer’s activity and the contact frequency between the enhancer and the gene promoter2. These advances bring researchers closer to understanding the effects of variants in non-coding regions on downstream gene expression.

Then there are variants with effects that are confined to specific cell types. “Single-cell approaches, such as single-cell RNA sequencing and single-cell ATAC-seq [assay for transposase-accessible chromatin using sequencing], will be transformative for understanding the contribution of different cell types to disease,” says Engreitz. These methods are helping researchers to catalogue cell types and states in both healthy and diseased tissues, and determine which cell models are most relevant for V2F analysis.

For example, a recent study of the gene expression profile of gut tissue cells found that changes associated with inflammatory bowel disease were limited to a small number of the 51 cell subtypes present. Researchers observed that only inflammatory fibroblasts, inflammatory monocytes, microfold-like cells and certain T-cells expanded with disease3. Limiting the number of cell types that have to be examined will help speed up the V2F process.

From omics to drugs

Genomic studies are informing drug development across medicine. Lindgren gives examples of cystic fibrosis drugs that target particular mutations in the CFTR protein, and cholesterol-lowering drugs that inhibit PCSK9, mimicking a genetic variation in humans that lowers LDL cholesterol. “Genetic discoveries are enabling the development of drugs that target the root cause of disease,” she says.

The genome and transcriptome lay the foundation for the V2F journey. Thanks to advances in technologies to study gene expression regulation (epigenomics), proteomics and metabolomics, researchers can study cellular processes as never before. “The ‘omics pipeline’, gives us a comprehensive toolbox to address questions that cannot be addressed by genomics alone,” explains Edouard Nice, Head of Clinical Biomarker Discovery and Validation at Monash University in Melbourne, Australia. “Such analyses are particularly relevant to cancer, where genetic variants have been associated with a number of tumour types, and will assist in the roll-out of precision, or personalized, medicine.”

A proteogenomic analysis of human colon cancer, published in 2019 by the Clinical Proteomic Tumour Analysis Consortium, found that increased phosphorylation of the retinoblastoma protein was associated with higher cancer cell proliferation and decreased apoptosis. The authors also showed that increased glycolysis in cancer cells correlated with immune system evasion by tumours carrying a high number of mutations, which are typically considered to be more susceptible to cancer-killing T cells4. “This study highlights the potential of multi-omics data to identify the function of genetic variants and new therapeutic opportunities,” says Nice.

Since most common complex diseases develop gradually and are also influenced by environmental factors, omics data should be gathered at multiple time points, says Nice. Ongoing initiatives to collect and store omics data from patients with different diseases, such as the Roadmap Epigenomics Project and the UK biobank, are setting the stage for systematic analyses of the molecular changes that occur during disease. Further investigation of these changes in disease-relevant systems, including organoids, cell lines and genetically engineered animals, will allow researchers to understand the function of protein variants in the patient's disease process, and, ultimately, develop personalized treatments.

As V2F work gains momentum, it is important that researchers share knowledge and work together to accelerate understanding and, ultimately, treatment of common complex genetic diseases. “It’s such an exciting time,” says Lindgren. “So full of promise and opportunity.”

Learn more about going from variant to function by watching this Nature Genetics webinar.


  1. Sankaran, V.G. et al. Science 322, 1839-1842 (2008).

    Google Scholar 

  2. Fulco, C.P. et al. Preprint on bioRxiv (2019).

    Google Scholar 

  3. Smillie, C.S. et al. Cell 178(3): 714–730 (2019).

    Google Scholar 

  4. Vasaikar, S. et al. Cell 177, 1035-1049.e19 (2019).

    Google Scholar 

Download references

Related Articles


Quick links