6.6 million — that’s how many spots on the human genome Sekar Kathiresan looks at to calculate a person’s risk of developing coronary artery disease. Kathiresan has found that combinations of single DNA-letter differences from person to person in these select locations could help to predict whether someone will succumb to one of the leading causes of death worldwide. It’s anyone’s guess what the majority of those As, Cs, Ts and Gs are doing. Nevertheless, Kathiresan says, “you can stratify people into clear trajectories for heart attack, based on something you have fixed from birth”.
Kathiresan, a geneticist at Massachusetts General Hospital in Boston, isn’t alone in counting outrageously high numbers of variants. The polygenic risk scores he has developed are part of a cutting-edge approach in the hunt for the genetic contributors to common diseases. Over the past two decades, researchers have struggled to account for the heritability of conditions including heart disease, diabetes and schizophrenia. Polygenic scores add together the small — sometimes infinitesimal — contributions of tens to millions of spots on the genome, to create some of the most powerful genetic diagnostics to date.
This approach has taken off thanks to a number of well-resourced cohort studies and large data repositories, such as the UK Biobank (see pages 194, 203 and 210), which collect vast quantities of health information alongside DNA data from hundreds of thousands of people. And some studies published in the past year or so have been able to analyse more than a million participants by combining information from such sources, increasing scientists’ ability to detect tiny effects.
Supporters say that polygenic scores could be the next great stride in genomic medicine, but the approach has generated considerable debate. Some research presents ethical quandaries as to how the scores might be used: for example, in predicting academic performance. Critics also worry about how people will interpret the complex and sometimes equivocal information that emerges from the tests. And because leading biobanks lack ethnic and geographic diversity, the current crop of genetic screening tools might have predictive power only for the populations represented in the databases.
“Most people are keen to have a decent debate about this, because it raises all sorts of logistical and social and ethical issues,” says Mark McCarthy, a geneticist at the University of Oxford, UK. Even so, polygenic scores are racing to the clinic and are already being offered to consumers by at least one US company.
Peter Visscher, a geneticist at the University of Queensland, Australia, who pioneered the methods that underlie the trend, is broadly optimistic about the approach, but is still surprised by the speed of progress. “I’m absolutely convinced this is going to come sooner than we think,” he says.
When researchers completed the first drafts of the human genome in the early 2000s, many expected that it would mark the start of a medical revolution. Geneticists started searching for the differences that might explain why one person develops diabetes or heart disease whereas another does not. The idea was simple: compare a group of people with the condition to a group without and look for differences in their DNA. The variations generally came in the form of DNA-letter swaps, known as single nucleotide polymorphisms, or SNPs. If people with a condition tended to have a T at a certain location whereas others had a C, that suggested that the SNP was associated in some way with the disease.
These genome-wide association studies — or GWASs, as they came to be known — became very popular. But after years of searching, scientists could still only explain a small bit of the inherited risk for common diseases. It turned out that most of these conditions were related to many more SNPs than scientists had first expected, says Ali Torkamani, a geneticist at the Scripps Research Institute, La Jolla, California.
Worse still, a majority of the variants conferred a very small risk — detectable only when surveying huge groups of people.“We didn’t have the sample size to really drive prediction as well as some people naively thought,” says Ewan Birney, director of the European Bioinformatics Institute in Hinxton, UK. By 2007, geneticists were fretting about something they called “missing heritability”. It was clear that many of these conditions had a genetic component, but GWASs clearly weren’t catching much of it.
Today, things are changing. With access to massive data sets, as well as advances in how data are analysed, scientists are getting better at measuring those very small risks, says Kathiresan.
A prime example is the technique Kathiresan used to generate his 6.6-million SNP score, which was published in August1. He and his team took data from a 2015 meta-analysis that combined 48 GWASs, consisting of 61,000 people with coronary artery disease and 120,000 controls2. They then tested their polygenic predictor on 290,000 people in the UK Biobank, finding that those scoring in the highest few percentiles had on average several times higher risk of developing the disease than did the rest of the population (see ‘The multi-gene prediction tools’). Of the 23,000 people who received the highest scores, for example, 7% had coronary artery disease, compared with 2.7% of the remaining population. The group conducted similar analyses for four other disorders, including inflammatory bowel disease and breast cancer, each time identifying a group who scored in the top few percentiles and were at particularly high risk.
The paper has drawn praise from some researchers as a demonstration that polygenic risk scores could, in theory, be used in the clinic. The ability of the scores to identify high-risk groups, Kathiresan says, parallels existing measures of risk used in medicine. “Essentially what you have is a new risk factor for coronary artery disease.”
Kathiresan’s work made headlines and triggered some controversy — owing to the sheer number of variants included in the risk score. Only a fraction of those 6.6 million SNPs actually contribute to the prediction, says biostatistician Nilanjan Chatterjee from the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland, who was not involved in the study. This is because of how these kinds of scores are calculated: data for all the variants are stuck into an algorithm, which assigns a weight to each one according to how strongly it is related to the disease, and most will in fact pose little or negligible risk.
Many researchers, including Chatterjee, say that it doesn’t matter if many variants with minimal effect are included. But others worry that including millions of variants that don’t do anything could undermine public trust in the scores. Cecile Janssens, an epidemiologist at Emory University in Atlanta, Georgia, says she is not impressed by the study. One of her concerns is that the millions of variants used to calculate the final score didn’t improve performance by much compared with a score made from just 74 SNPs with the strongest links to disease. If these sorts of scores are going to be used clinically, she says, “the credibility of the score is also important.”
Course of action
Whereas Kathiresan’s study focused mainly on genetic risk, others are looking at how the polygenic scores might complement existing measures of risk. In 2013, Samuli Ripatti, a statistical geneticist at the University of Helsinki, found that combining a polygenic risk score with conventional risk factors for coronary artery disease, such as high body-mass index and elevated blood pressure, improved predictions of who would develop the disease3. He was also able to identify a group of people with high genetic risk scores who would otherwise have only been considered to be at intermediate risk, and Ripatti says that this ability to pick out individuals who fly under the radar is the biggest benefit of polygenic risk scores.
Genetic risk scores could also improve screening regimes for diseases such as breast cancer. In the United States, women are currently advised to start getting mammograms from the age of 50, but if younger at-risk women could be identified, they might benefit from earlier screening. In 2016, Chatterjee developed a model for breast cancer that incorporated both conventional risk factors and a polygenic score calculated from around 90 SNPs4. On the basis of these scores, he predicted that 16% of women aged 40 have a risk equivalent to the average 50-year-old — suggesting that they could benefit from screenings starting at 40. The team is now testing its model in other data sets and with a larger number of SNPs, to see whether the predictions hold up.
Meanwhile, personalized-medicine company Myriad Genetics in Salt Lake City, Utah, has already begun to include a polygenic risk score for breast cancer in the results it provides to some women. Only about 10% of women with a family history of breast cancer have one of the harmful single-gene mutations associated with the disease, so the company is now returning a score to the remaining 90% that tells them their likelihood of developing breast cancer according to a combination of polygenic risk and factors such as history and lifestyle. One of the strengths of these scores is that they provide a result for everyone, says Jerry Lanchbury, Myriad’s chief scientific officer. Although the current focus is on identifying women who are at high risk, in the future he could see the scores being used to find those who are at lower-than-average risk, who might potentially benefit from having less-frequent mammograms. “We start to enter a world where you can provide a precision-medicine result for everyone,” Lanchbury says.
All in the statistics
One complaint about polygenic scores is that they throw out biology in favour of statistics. Polygenic scores alone won’t provide much insight for drug development, but the studies can provide a starting point for delving into the individual variants and working out which genes they affect and the mechanisms that might lead to disease.
Part of that insight will come from disentangling which variants actually produce a given trait or disease, and which are just along for the ride. A SNP that is associated with a disease isn’t necessarily its cause: it could simply be that the variant tends to be inherited alongside another part of the genome that is directly involved. For example, Kathiresan estimates that only about 6,000 of his 6.6 million SNPs are causally related to coronary artery disease. As sample sizes get larger, it becomes easier to tease these variants apart, says McCarthy.
There is also still a significant portion of genetic risk that current studies can’t account for. Ripatti estimates that 30–50% of the risk for many common diseases is genetic — much of the rest is determined by environmental factors. But the problem of missing heritability remains: as a rule of thumb, GWASs can currently account for about one- to two-thirds of the inherited risk of disease, says Visscher. As sample sizes get larger, researchers will probably find more variants that contribute to the risk, says Torkamani, although the returns diminish. “At some point, you’re just going to stop getting too much utility from additional genetic risk factors,” he says. More of the genetic risk might also be picked up by whole-genome sequencing, adds Visscher. Currently, GWAS research is conducted mainly using arrays that sequence only a portion of the genome, but as whole-genome sequencing becomes cheaper and more widespread, less-common variants that contribute to disease might become easier to find.
From lab to clinic
Kathiresan says he hopes to have a score for coronary artery disease on the market in the next year. But most researchers acknowledge that there are obstacles to overcome before these scores can be used widely. The number one hurdle, says McCarthy, is applying them to different populations. The risk scores are generated and validated in data sets made up mainly of people with European ancestry, such as the UK Biobank, limiting the extent to which they can be applied to people of other ethnicities. Myriad’s score, for example, is currently available only to individuals with a European background, although Lanchbury says that the company is in the process of developing a similar score for African American women. McCarthy says that the ultimate aim is to generate risk scores that are specific to ethnicity.
Ethnicity isn’t the only complicating factor, Birney adds. The populations analysed in the studies come from specific health-care systems, and their experiences don’t necessarily translate across countries. The chance of having a heart attack could vary between the United Kingdom and United States, for example, as could the standards of care. So scores might not be translatable.
Even the simple act of communicating these scores to people brings with it a number of concerns. Doctors are not necessarily trained in genetics, says McCarthy, and “there aren’t enough genetic counsellors on the planet” to conduct the nuanced discussions that genetic risk scores will entail. There is a popular misconception that because our genetics doesn’t change, “it’s somehow a destiny that will be fulfilled”, says Birney. Janssens worries that if people think that the chance of getting a disease is hard-wired into their DNA, they won’t be motivated to do anything about it.
The concern becomes even more acute for non-disease traits that might be predicted by such scores. A study on more than 1 million people published earlier this year developed a polygenic score that essentially correlates with how long people stay in education5. The authors of that study went to great lengths to clarify they were not suggesting any kind of intervention for people who have extremely low scores. “Any practical response — individual or policy-level — to this or similar research would be extremely premature,” they write.
Michelle Meyer, a bioethicist at Geisinger Health System and a co-author on the study says that the score simply isn’t actionable. Without understanding the biological differences represented by the score — or the environmental and social factors bound to interact with those differences — it’s impossible to know how to intervene.
Understanding how people will react to polygenic scores is a high priority for researchers. Ripatti and his colleagues have given more than 7,000 individuals in Finland information about their likelihood of developing heart disease, based on both polygenic scores and conventional risk factors such as high blood pressure. Most of the respondents say that getting this information motivates them to make positive changes, says Ripatti. Preliminary results suggest that those with high genetic risk are the most likely to take actions such as losing weight or stopping smoking.
In nearby Estonia, researchers are in the process of genotyping 100,000 individuals, adding to the 50,000 the country has already sampled. And unlike many other biobanks, participants in the Estonian project can sign up to receive feedback. Among the results being returned to them are polygenic risk scores for type 2 diabetes and cardiovascular disease, says Lili Milani, a geneticist at the Estonian Genome Center at the University of Tartu, Estonia. Similar to the Finnish work, participants are shown graphs of how lifestyle changes could reduce or increase their risk. And, says Milani, initial indications are that people are glad for the advice.
For now, people are receiving their scores from genetic counsellors. But Milani is working with the Estonian government to work out how to integrate genomic data into the health-care system, so that it can be used every day by doctors. The country ultimately aims to genotype anyone who’s interested, right up to its entire population of 1.3 million, Milani says. “The goal is to build something so great that all doctors will want to recommend it and all of the population will want it.”
Nature 562, 181-183 (2018)