Rare versus common diseases: a false dichotomy in precision medicine

Precision medicine initiatives are being launched worldwide, each with the capacity to sequence many thousands to millions of human genomes. At the strategic planning level, all are debating the extent to which these resources will be directed towards rare diseases (and cancers) versus common diseases. However, these are not mutually exclusive choices. The organizational and governmental infrastructure created for rare diseases is extensible to common diseases. As we will explain, the underlying technology can also be used to identify drug targets for common diseases with a strategy focused on naturally occurring human knockouts. This flips on its head the prevailing modus operandi of studying people with diseases of interest, shifting the onus to defining traits worth emulating by pharmaceuticals, and searching phenotypically for people with these traits. This also shifts the question of what is rare or common from the many underlying causes to the possibility of a common final pathway.


INTRODUCTION
The 100,000 Genomes Project led by Genomics England has been a huge success, based not only on their scientific publications 1 but also by their impact on the National Health Service (NHS). Since 2019, NHS has offered genome sequencing as part of healthcare, and the plan is to sequence five million individuals over the next 5 years 2 . This has inspired similar initiatives worldwide, even in middle-income countries like Thailand 3 . Many are focusing on rare diseases, or to a lesser extent cancers. Others are studying the general population and/or building infrastructure (see Table 1). This reflects a longstanding categorization of medical disorders as rare diseases of primarily monogenic etiology versus common diseases of complex multifactorial etiology where most of the healthcare spending resides. These projects all envision a future of precision medicine (PM) where the availability of more data (not necessarily always genomes) facilitates our ability to better diagnose, treat, and prevent diseases. With limited resources, debates on where to begin are inevitable. However, such debates rest on a false dichotomy, i.e., that by starting with rare diseases we have forsaken our obligation to address common diseases. To the contrary, what we build and what we learn by implementing PM for rare diseases is extensible to common diseases, not only the immediate goal of better diagnoses but also the long-term challenge of identifying drug targets for common diseases.

RARE DISEASES FOR THE SHORT TERM
First, what are rare diseases? In the United States, a rare disease is defined as a condition that affects fewer than 200,000 people, or 1 in 1650 people given a current population size of 330 million. This definition is based on the Orphan Drug Act of 1983. In the European Union, rare is defined as fewer than 1 in 2000 people. Most of these diseases present in children, but some present in adults. Although rare in isolation, they are not rare in aggregate. The oft-cited number is that they affect 7% of the population (see Box 1). Most of these diseases are attributed to a single defective gene, i.e., Mendelian, and the identity of this gene is known for many thousands of diseases. The argument for rare diseases is not just that they are better understood. Health economics are more favorable 4 . Because they are so rare, few physicians are trained to recognize them. Hence, they are poorly diagnosed. Affected individuals often endure years of diagnostic odyssey, which is not only fruitless but more expensive than sequencing their genomes upfront 5,6 . For infants admitted to intensive care within the first 100 days of life, sequencing produced diagnostic yields of 36.7%; and in 52.0% of the diagnosed, medical management was affected 7 . Results improved to 50.8% and 71.9%, respectively, when trio sequencing was conducted. Other studies have given similar results 8 .
At its heart, PM is about making better diagnoses (see Fig. 1) using the latest technologies to gather more data 9 and letting that guide our subsequent decisions. To transition from research to routine healthcare requires input from many stakeholders. Every jurisdiction has its own challenges. A good example for how this might be done is the Melbourne Genomics Health Alliance 10 . To diagnose rare diseases, we need sequencing machines, highthroughput computers, and a multi-disciplinary team to manage/ interpret the outputs. Most of the costs are in salaries for skilled experts. Much as the invention of magnetic resonance imaging resulted in the creation of specialized referral facilities to acquire and interpret the data, a similar arrangement is used in PM. The referring physician ultimately gets a diagnosis from another physician at the referral facility. Occasionally, the two physicians interact to gather more data before a final diagnosis can be made. Additional experiments are sometimes required to validate novel gene and/or mutation functions, although this is being ameliorated by large-scale phenotyping efforts 11 . The bottleneck, however, is in the training and certification of these multi-disciplinary teams.
To what extent do the lessons of creating such referral facilities for rare diseases transfer to common diseases? Historically, medical progress has often entailed splitting of a disease into a series of subdiseases, each treated differently. It is not inconceivable that PM will eventually transform any common disease into a series of rare diseases. How we stratify into sub-diseases is still to be determined, and it need not always be genetic, let alone monogenic. PM is BOX 1: Lower bound for overall prevalence of rare diseases It is very difficult to measure the prevalence for every known rare disease, not only because there are so many, but also, some are so rare we would need to sample a very large population to obtain an accurate estimate. However, it is possible to establish a lower bound for overall prevalence by adding the numbers for all instances where prevalence has been measured. This information is available from the Orphanet website 37 , organized into mutually exclusive categories: worldwide prevalence, worldwide birth prevalence, European prevalence, and European birth prevalence. The sum is 6.38%, and sorted by categories it is 0.78%, 0.81%, 3.54%, and 1.25%, respectively. The oftcited estimate of 7%, for example in the UK-NHS report "Generation Genome" 38 , comes remarkably close to this bound. However, combining electronic health records with genomics has identified subsets of people with distinct genetic causes for many common diseases, arguing that people with undiagnosed Mendelian diseases are more prevalent than often assumed 39 . One could therefore ask how much larger the true prevalence might be. We can extrapolate two ways. If the measured prevalences are an unbiased random sampling of rare diseases, and given that there are over 6000 rare diseases, the total would be >50%. We believe this is highly implausible. More likely, the Orphanet website contains the most common diseases. Given how the cumulant is clearly approaching an asymptote after just a few hundred cases (see Fig. 2), the more plausible total is unlikely to be much larger than 7%. An interesting comparison is the fraction of common multifactorial diseases that can be attributed to early-onset familial forms driven by highly penetrant rare variants. A summary of the published estimates, based on extensive genome-wide association studies and reanalyzes of the data, puts this number at about 10% 40 . simply accelerating this process, with complex data sets that require multi-disciplinary teams to manage and interpret. Hence, the organizational lessons from diagnosing rare diseases are directly transferable to common diseases. Note also that, as we stratify by mutated genes, many therapeutics for otherwise common cancers now qualify for orphan drug status 12 . This was certainly not the intention of the orphan drug laws, and we may need to update these laws. For example, perhaps orphan drug status should be granted based on the number of patients across all indications. The sooner policymakers are warned about this growing issue, the more likely they can deal with the ramifications.

COMMON DISEASES IN THE LONG TERM
All that said, the fact remains that other than perhaps cancer, we are not ready to implement PM in routine healthcare for most common diseases. Therein lies the source of the tensions between and within PM initiatives. Ironically, the way out of this conundrum is to redirect the technology created to diagnose rare diseases towards a strategy to find drug targets for common diseases. What we outline here has its roots in a 22-years-old hypothesis on how gene losses can drive evolutionary changes 13 . It is coupled to the realization that human genetics should be a better model of drug action than animal models or cell lines 14 . For previously approved drugs, human genetics is known to be a good predictor of efficacy and adverse effects [15][16][17] . Given that the mode-of-action with most drugs is to simulate gene loss, this proposal can be encapsulated by the acronym HKMDs, or Human Knockouts as Models of Drug action. PCSK9 inhibitors, first approved in 2015, are the canonical example. They are more effective than statins at lowering serum cholesterol 18 and were inspired by a discovery that individuals with loss-of-function (LOF) mutations in PCSK9 exhibit low levels of serum LDL and abnormally good cardiovascular health 19 . Other examples are known in coronary artery diseases 20 . In 2019, a drug (romosozumab) that increases bone density was approved for osteoporosis. It was inspired by another rare LOF mutation, in SOST, where the affected individuals have bones so dense they do not break 21,22 . Although we are only aware of a small number of HKMDs (see Table 2), there are reasons to believe they are widespread (see Box 2). Historically, their discovery has been serendipitous, because human geneticists do not as a matter of practice screen for rare phenotypes. People screen themselves and report to a physician if they are sick; but rare individuals with HKMDs are not typically sick, and therefore, have no reason to self-report. There are two approaches to make HKMD discovery more systematic 23 . The genotype-first method would sequence a large number of individuals and analyze their genomes for LOFs that might create the opposite of a disease state (e.g., high bone density) or confer protection against disease (e.g., low serum LDL). As electronic health records are finite, recontact permission will be essential to confirm inferred phenotypes. This is being done with consanguineous populations 24 and at the UK Biobank 25 . The phenotype-first method would ideally screen a much larger population for hypothesized HKMD phenotypes; for example, using social media to entice individuals to self-report. Considering the multifactorial nature of common diseases, we would expect there to be many causes-not all genetic, let alone a LOF-for any given phenotype. Since the people we sequence are not sick, if we cannot identify a promising LOF in one person, we can move on to the next. Once a candidate HKMD is identified, we can use the  growing human genome sequencing databases to validate the genotype-phenotype relationship across a larger number of individuals. Importantly, we can ascertain if a certain genetic background (i.e., the population in which the LOF was discovered) is necessary for that phenotype to manifest. This level of validation would be inconceivable with animal models or cell lines. Of the two approaches, the phenotype-first method is most compatible with PM facilities set up to diagnose rare diseases. Rather than identify rare mutations specific to sick individuals, they would now identify rare mutations specific to individuals with a phenotype that mimics a desired pharmaceutical objective. Anyone with the large-scale capacity to diagnose rare diseases can easily devote 10% of that capacity to screen phenotypicallydefined individuals for HKMDs. This flips on its head a prevailing narrative in medical genetics that views LOFs as detrimental to a small number of people. In the future, rare LOFs may be seen as key to drug development that benefits a large number of people.
DISCUSSION Some readers will have noticed a contradiction between two of our key points. If a common disease is a series of rare diseases, might that require a series of HKMD-inspired drugs? Much has been written about the genetic and environmental architecture of complex multifactorial diseases 26,27 , and it is dangerous to generalize to all common diseases. However, to the extent that a disease has a common final pathway of phenotypic or clinical expression triggered by many different genetic and environmental factors, one HKMD-inspired drug may be effective for a large fraction of affected individuals. This certainly is the hope for PCSK9 inhibitors, although more years of data are required to see if they improve cardiovascular health under all genetic and environmental backgrounds. The bigger change that we wish to catalyze is the idea that sequencing people without the disease of interest may be a more efficient way to identify drug targets. Finding a LOF that causes a Mendelian disease does not immediately point us towards a drug target, but finding a LOF that confers a pharmaceutically desirable phenotype does. HKMDs need not be inherited. Some might be de novo mutations. Many are likely to be even rarer than the Mendelian disease alleles that have been the focus of so many fruitful studies. The challenge is to define traits worth emulating by drugs, and to phenotypically screen a very large population for people with these traits.

BOX 2:
Human knockouts as models of drug action To argue that HKMDs may be widespread is to dispel three common misperceptions. First, LOF mutations should not be tolerated in evolutionarily conserved genes. Second, the number of naturally occurring LOFs in any particular individual's genome ought to be small. Third, it is not possible to modify a complex trait in an arbitrary direction simply by inhibiting a gene/ protein. Here, we argue that all three propositions are false. On the first point, systematic deletions of the Saccharomyces cerevisiae genome have long established that only one in five yeast genes are necessary for survival 41 . The human version of these experiments was done more recently. Three independent studies on human cell lines demonstrated that only 10% of our 23,425 protein-coding genes are essential for survival [42][43][44] . Apparently, even for evolutionarily conserved genes, selective pressures to maintain function are weak. On the second point, initial studies on 185 genomes 45 and 60,706 exomes 46 estimated that any human individual has 100-85 heterozygous and 20-35 homozygous LOFs, respectively. A more recent analysis of 141,456 genomes and exomes computed the number of individuals needed to find LOFs in every gene 47 . The distribution for heterozygous LOFs peaked at ten thousand individuals, and LOFs were seen in 79.8% of the genes for this particular data set. For homozygous LOFs, the distribution peaked at a hundred million individuals, and even if we sequenced everyone in the world, four thousand genes will have no LOFs. However, drugs rarely (if ever) inhibit their targets completely; hence, the heterozygous distribution may be more appropriate for HKMDs. If so, any city with a million residents will have multiple individuals with LOFs in almost any gene that might ever be targeted for drug development. Notice however that many of these variants will likely be rarer than oft-studied Mendelian alleles. On the third point, the critical determinant is the extent to which the trait of interest is regulated, with different genes that drive the trait in opposite directions. By analogy, imagine driving with feet simultaneously on the accelerator and brake. To make the car go faster or slower, one can "inhibit" the brake or accelerator, respectively. Most biological processes are indeed regulated; none more so than the complex traits underlying common diseases. To the extent that this is the case, the primary reason why there may not be an HKMD for drug development is the fact that some LOFs are not tolerated, even as heterozygotes. To a first approximation, this is equivalent to saying there is no drug target for the pharmaceuticals industry to inhibit. Other approaches are required (e.g., drugs to simulate gain-of-function). The column LOF indicates if the phenotype is observed in heterozygotes (T) or homozygotes (M). For FAAH, the notation T** indicates that the trait requires heterozygous mutations in two different but functionally related loci. FAAH is a promising alternative for pain-relieving drugs inspired by LOFs in SCN9A 34 . Allele frequency and effect size, when provided, come from the cited references. TG triglyceride, LDL low-density lipoproteins, CHD coronary heart disease.