Illustration of a magnifying glass looking at a strand of DNA isolated on a dark blue background

Credit: Adapted from Getty

The human genome contains some 40,000 protein-coding and non-coding genes. They’re not all studied equally, however. Although scientists can now survey thousands of genes at a time to find those associated with a given trait, they still tend to focus on the same genes that were popular even before the Human Genome Project was completed, more than 20 years ago.

A pair of tools aims to flag interesting but neglected human genes to researchers who might be looking for genetic diamonds in the rough.

One tool, called Find My Understudied Genes (FMUG), emerged from a study published in March1, which first explores why interesting, but relatively under-researched, genes are not highlighted in genetic surveys, and then offers FMUG as a remedy.

The second tool is the Unknome database, created by a team led by Matthew Freeman at the University of Oxford, UK, and Sean Munro at the MRC Laboratory of Molecular Biology, Cambridge, UK, that was described2 in 2023.

“We are in the lucky position to know what we don’t know,” says Thomas Stoeger, a biologist at Northwestern University in Chicago, Illinois, and co-author of the FMUG study.

Given a set of genes, the Unknome database identifies orthologs — genes with common ancestry — in other species, then counts the number of published findings on each gene and its relatives, weighted by the strength of the evidence behind the finding. Users can rank genes by how studied they are.

FMUG helps people to narrow a list of human genes — such as possible targets from genome-scale sequencing studies — using various filters, including the gene’s popularity in the published literature. Stoeger’s lab used a prototype of the software to focus its efforts on an ageing-associated gene called SFPQ. Subsequent work allowed the team to document that decreased expression of SFPQ caused effects similar to ageing3. “There’s a lot of excitement in the ageing community about that finding,” Stoeger says.

Pattern of neglect

There are plenty of reasons that some genes are studied more than others. One obvious possibility is that some sequences simply have weaker links to diseases and are therefore less ‘interesting’ to researchers and funding bodies. But studies have found little correlation between the strength of evidence for a gene and the number of papers that are published about it. Stoeger and his colleagues found1, for example, that 44% of the genes that the US National Institutes of Health had identified as promising targets for Alzheimer’s disease hadn’t been mentioned in the titles or abstracts of any papers on Alzheimer’s.

The team considered three other explanations for the lack of studies: that understudied genes don’t turn up in the genomic surveys as ‘hits’ related to traits or diseases; that study authors fail to highlight the understudied genes; or that authors of follow-up papers fail to study understudied genes that genomic papers have highlighted. By analysing 909 genome-wide surveys collected from 4 study databases and thousands of papers on specific genes that cited those surveys, the authors found support for the second hypothesis: of 18,295 genetic hits identified in 148 surveys of gene-expression data included in the analysis, just 161 were mentioned in the study title or abstract. Those that were mentioned tended to be already well studied in the literature.

The authors then asked why authors of genomics studies might choose to highlight certain genes over others. One reason, they found, was the availability of research reagents specific to those genes. Another was the number of existing papers on the genes. It’s a self-reinforcing loop, says Reese Richardson, a biologist at Northwestern University and a co-author of the paper. If not many people study a gene, others might not develop the tools to study it, so it remains unstudied.

Some genes might also be understudied for sociological reasons, says Freeman. “There’s this tension between wanting to be pioneers and explorers, and on the other hand, feeling safe in little social groups where you get recognition and kudos, and you’re confident that your paper will be reviewed by a buddy of yours who works on something similar,” he says. Intriguingly, the Northwestern researchers found that papers on understudied genes receive more citations than other papers.

Users of FMUG — which is available for Windows, macOS and iOS — import a list of genes and then can apply any of about 300 filters to highlight, say, genes with fewer than a certain number of associated papers. Other filters include the existence of certain tools for studying the genes, or the availability of a mouse homologue. “With a few minutes of work, you might find something that you are able to study that others are not,” Stoeger says.

Researchers at the University of Copenhagen used FMUG to show that extremely compact ‘intrinsically disordered’ proteins tend to be understudied relative to those with more common characteristics4. “I found the tool useful simply because it enables us to highlight these kinds of disparities,” Giulio Tesei, one of the paper’s co-authors, says.

And this month, researchers posted a paper5 on the bioRxiv preprint server highlighting understudied genes in the roundworm, Caenorhabditis elegans. The authors compiled 432 tables from 112 papers listing genes that silence RNA in the roundworm, then listed the genes from those tables that didn’t appear often (fewer than 10 times) in the main text of the 112 articles or other papers.

Freeman says that beyond their practical benefits, such tools are valuable because they raise awareness of blind spots in the published literature. There’s a high cost to such neglect, he says: understudied genes can have key roles in fundamental biology, disease aetiology and drug discovery. And for budding researchers, there’s another benefit, he adds. In establishing a research group, scientists need to identify a topic to call their own. “The big challenge is, how do I find my own niche?”