Refining the impact of genetic evidence on clinical success

The cost of drug discovery and development is driven primarily by failure1, with only about 10% of clinical programmes eventually receiving approval2–4. We previously estimated that human genetic evidence doubles the success rate from clinical development to approval5. In this study we leverage the growth in genetic evidence over the past decade to better understand the characteristics that distinguish clinical success and failure. We estimate the probability of success for drug mechanisms with genetic support is 2.6 times greater than those without. This relative success varies among therapy areas and development phases, and improves with increasing confidence in the causal gene, but is largely unaffected by genetic effect size, minor allele frequency or year of discovery. These results indicate we are far from reaching peak genetic insights to aid the discovery of targets for more effective drugs.


SUPPLEMENT
Refining the impact of genetic evidence on clinical success Eric Vallabh Minikel, Jeffery L Painter, Coco Chengliang Dong, Matthew R. Nelson The supplementary figures below correspond to main and extended data figures, with Pharmaprojects filtered for only those drugs with a single target assigned ("one target only" mode).Supplementary tables are provided in a separate Excel file.

Figure S1. Impact of genetic evidence characteristics on relative success (Figure 1) in "one target only" mode. A) Proportion of target-indication (T-I) pairs with genetic support, P(G), as a function of highest phase reached.
The center is the exact proportion and error bars are Wilson 95% confidence intervals.The N is indicated at right, with the denominator being the total number of T-I pairs reaching each phase, and the numerator being the number of those that are genetically supported.B) Sensitivity of relative success (RS) from phase I-launch of T-I pairs with genetic evidence to source of human genetic association.GWAS Catalog, Neale UKBB, and FinnGen are subsets of Open Targets Genetics (OTG).The center is the point estimate of RS (see Methods) and bars are Katz 95% confidence intervals.The N is indicated at right, with the denominator being the number of T-I pairs with genetic support from each source, and the numerator being the number that are launched.Note that RS is calculated from a 2x2 contingency table and is also dependent upon the number of T-I pairs without genetic support.Total N = 7,870 T-I pairs.C) Sensitivity of RS to locus-to-gene (L2G) share threshold among OTG genome-wide association study (GWAS)   N genes associated with similar traits N indications e minimum L2G share required for inclusion in the dataset is varied from 0.1 to 1.0 in increments of 0.05 (labels) while RS (y axis) is plotted against the number of clinical (phase I+) programs considered to have genetic support from OTG (x axis).The line represents point estimates of RS as a function of L2G threshold while shaded areas are Katz 95% confidence intervals.D) Sensitivity of RS for OTG GWAS-supported T-I pairs to binned variables: i) year in which a T-I pair first acquired human genetic support from GWAS, excluding replications and excluding T-I pairs otherwise supported by OMIM, ii) number of genes exhibiting genetic association to the same trait, iii) quartile of effect size (beta) for quantitative traits, iv) quartile of effect size (odds ratio, OR) for case/control traits standardized to be >1 (i.e., 1/OR if <1), and v) order of magnitude of minor allele frequency bins.The center is the point estimate of RS (see Methods) and bars are Katz 95% confidence intervals.The N is indicated at right, as in (B).Total N = 7,870 T-I pairs.E) Count of indications ever in development in Pharmaprojects (y axis) by the number of genes associated with traits similar to those indications (x axis).

Figure S2. Differences in relative success between therapy areas and the number and diversity of indications per target (Figure 2) in "one target only" mode. A-E) RS by therapy area and phase transition.
The center is the point estimate of RS (see Methods) and bars are 95% confidence intervals.The N is indicated at right, with the denominator being the number of T-I pairs with genetic support, and the numerator being that succeeded in the phase transition indicated at the top of the panel.Note that RS is calculated from a 2x2 contingency table and is also dependent upon the number of total N=14,307 preclinical,7,870      areas.The largest impact is seen in "signs/symptoms", where removing the filter drops the RS from 1.91 to 1.63.The relatively minor impact of removing the genetic insight filter is consistent with the findings of King et al 15 , who varied the minimum number of genetic associations required for an indication to be included, and found that risk ratio for progression (i.e.RS) was slightly diminished when the threshold was reduced.

Figure S3. Clinical investigation of drug mechanisms with genetic evidence (Figure 3) in "one target only" mode. A) Heatmap of proportion of genetically supported T-I pairs that have been developed to at least phase I, by therapy area (y axis) and gene list (x axis). B)
confidence intervals.The N is indicated at right, where the denominator is the total number of launched T-I pairs in each bin and the numerator is the number of those that are genetically supported.J) RS (y axis) vs. mean similarity among launched indications per target (x axis) by therapy area.K) RS (y axis) vs. mean count of launched indications per target (x axis).
Figure S4.Further analysis of influence of characteristics of genetic associations on relative success (Extended Data Figure2) in "one target only" mode.A) Sensitivity of RS to the similarity threshold between the MeSH ID for the genetically associated trait and the MeSH ID for the clinically developed indication.The threshold is varied by units of 0.05(labels)  and the results are plotted as RS (y axis) versus number of genetically supported T-I pairs (x axis).B) Breakdown of OTG and OMIM RS values by whether any drug for each T-I pair has had orphan status assigned.The N of genetically supported T-I pairs (denominator) and, of those, launched T-I pairs (numerator) is shown at right.Total N = 8,780 T-I pairs, of which 3,149 are orphan.The center is the RS point estimate and error bars are Katz 95% confidence intervals.C) RS for somatic genetic evidence from IntOGen versus germline genetic evidence, for oncology and non-oncology indications.Note that the approved/supported proportions displayed for the top two rows are identical because all IntOGen genetic support is for oncology indications, yet the RS is different because the number of non-supported approved and non-supported clinical stage programs is different.In other words, in the "All indications" row, there is a Simpson's paradox that diminishes the apparent RS of IntOGen -IntOGen support improves success rate (see 2 nd row) but also selects for oncology, an area with low baseline success rate (as shown in Extended Data Fig.6A).N is displayed at right as in(B).Total N = 8,780 T-I pairs, of which 4,504 nononcology, 3,366 oncology, 606 targeting IntOGen oncogenes, 155 targeting tumor suppressors, and 95 targeting IntOGen genes of unknown mechanism.The center is the RS point estimate and error bars are Katz 95% confidence intervals.D) As for top panel of Figure 1D, but without removing replications or OMIMsupported T-I pairs.N is displayed as in (B).Total N = 8,780 T-I pairs.The center is the RS point estimate and error bars are Katz 95% confidence intervals.E) As for top panel of Figure 1D, removing replications but not removing OMIM-supported T-I pairs.N is displayed as in (B).Total N = 8,780 T-I pairs.The center is the RS point estimate and error bars are Katz 95% confidence intervals.F) Proportion of T-I pairs supported by a GWAS Catalog association that are launched (versus phase I-III) as a function of the year of first genetic association.G) Launched T-I pairs genetically supported by OTG GWAS, shown by year of launch (y axis) and year of first genetic association (x axis).Gene symbols are labeled for first approvals of targets with at least 5 years between association and launch.Of 60 OTG-supported launched T-I pairs (Fig. 1D), year of drug launch was available for N=24 shown here, of which 10 (42%) acquired genetic support only in or after the year of launch.The true proportion of launched T-I whose GWAS support is retrospective may be larger if the T-I with a missing launch year are more often older drug approvals less well annotated in Pharmaprojects.H) Lack of impact of GWAS Catalog lead SNP odds ratio (OR) on RS when using the same OR breaks as used by King et al 15 .N is displayed as in (B).Total N = 8,780 T-I pairs.The center is the RS point estimate and error bars are Katz 95% confidence intervals.

Figure S5 .
Figure S5.Sensitivity to changes in genetic data and drug pipeline over the past decade and to the 'genetic insight' filter (Extended Data Figure3) in "one target only" mode."2013" here indicates the data freezes from Nelson et al 2015 5 (that study's supplementary dataset 2 for genetics and supplementary dataset 3 for drug pipeline); "2023" indicates the data freezes in the present study.All datasets were processed using the current MeSH similarity matrix, and because "genetic insight" changes over time (more traits have been studied genetically now than in 2013), all panels are unfiltered for genetic insight (hence numbers in panel D differ from those in Fig.1A).Every panel shows the proportion of combined (both historical and active) targetindication pairs with genetic support, or P(G), by development phase.For this supplementary figure, note that "one target only" mode can only be applied to the 2023 drug pipeline dataset; 2013 drug pipeline information is simply used as-is from the Nelson 2015 publication and cannot be filtered for "one target only" drugs.A) 2013 drug pipeline and 2013 genetics.B) 2013 drug pipeline and 2023 genetics.C) 2023 drug pipeline and 2013 genetics.D) 2023 drug pipeline and 2023 genetics.E) 2023 drug pipeline with only OTG GWAS hits through 2013 and no other sources of genetic evidence.F) 2023 drug pipeline with only OTG GWAS hits for all years, no other sources of genetic evidence.We note that the increase in P(G) over the past decade 5 is almost entirely attributable to new genetic evidence (e.g.contrast B vs. A, D vs. C, F vs. E) rather than changes in the drug pipeline (e.g.compare A vs. C, B vs. D).In contrast, the increase in RS is due mostly to changes in the drug pipeline (compare C, D, E, F vs. A, B), in line with theoretical expectations outlined by Hingorani et al16  and consistent with the findings of King et al 15 .We note that both the contrasts in this figure, and the fact that genetic support is so often retrospective (Extended Data Fig.2G) suggest that P(G) will continue to rise in coming years.For 2013 drug pipeline,N=8,605 preclinical, 1,772 phase I, 2,779 phase II, 636  phase III, and 1,832 launched); for 2023 drug pipeline,N=18,132 preclinical, 3,102 phase I,  4,652 phase II, 955 phase III, and 942 launched).In A-F, the center is exact proportion and error bars are Wilson binomial 95% confidence intervals.Because all panels here are unfiltered for genetic insight, we also show the difference in RS across G) sources of genetic evidence and H) therapy areas when this filter is removed.In general, removing this filter decreases RS by 0.13; this varies only slightly between sources and

Figure
Figure S6.P(G) by phase versus therapy area (Extended Data Figure 5) in "one target only" mode.Each panel represents one therapy area, and shows the proportion of target-indication pairs in that area with genetic support, or P(G), by development phase.Total number of T-I pairs in any area: N=7,884 preclinical, N=2,797 phase I, N=4,121 phase II, N=826 phase III, N=797 launched.The center is the exact proportion and error bars are Wilson binomial 95% confidence intervals.See Figure S6 for the same analyses restricted to drugs with a single known target.

Figure S8 .
Figure S8.Further analyses of differences in relative success among therapy areas (Extended Data Figure 7) in "one target only" mode.A) Probability of success, P(S), by therapy area, with Wilson 95% confidence intervals.The N shown at right indicates the number of launched T-I pairs (numerator) and number of T-I pairs reaching at least phase I (denominator).The center is the exact proportion and error bars are Wilson binomial 95% confidence intervals.B) Probability of genetic support, P(G), by therapy area, with Wilson 95% confidence intervals.The N shown at right indicates the number of genetically supported T-I pairs reaching at least phase I (numerator) and total number of T-I pairs reaching at least phase I (denominator).The center is the exact proportion and error bars are Wilson binomial 95% confidence intervals.C) P(S) vs. P(G), D) RS s.P(S), and E) RS vs. P(G) across therapy areas, with centers indicating point estimates and crosshairs representing 95% confidence intervals on both dimensions -Katz for RS and Wilson for P(G) and

Figure S9 .
Figure S9.Level of utilization of genetic support among targets (Extended Data Figure 8) in "one target only" mode.As for Figure S3, but grouped by target instead of T-I pair.Thus, the denominator for each cell is the number of targets with at least one genetically supported indication, and each target counts towards the numerator if at least one genetically supported indication has reached phase I.
significant associations.The

Further analysis of influence of characteristics of genetic associations on relative success (Extended Data Figure 2) in "one target only" mode. A)
Sensitivity of RS to the similarity threshold between the MeSH ID for the genetically associated trait and the MeSH ID for the clinically developed indication.The threshold is varied by units of 0.05(labels)and the results are plotted as RS (y axis) versus number of genetically supported T-I pairs (x axis).B) Breakdown of OTG and OMIM RS values by whether any drug for each T-I pair has had orphan status assigned.The N of genetically supported T-I pairs (denominator) and, of those, launched T-I pairs (numerator) is shown at right.Total N = 8,780 T-I pairs, of which 3,149 are orphan.The center is the RS point estimate and error bars are Katz 95% confidence intervals.C) RS for somatic genetic evidence from IntOGen versus germline genetic evidence, for oncology and non-oncology indications.Note that the approved/supported proportions displayed for the top two rows are identical because all IntOGen genetic support is for oncology indications, yet the RS is different because the number of non-supported approved and non-supported clinical stage programs is different.In other words, in the "All indications" row, there is a Simpson's paradox that diminishes the apparent RS of IntOGen -IntOGen support improves success rate (see 2 nd row) but also selects for oncology, an area with low baseline success rate (as shown in supported indications (y axis).The center is the mean and bars are Wilson 95% confidence intervals.N = 937 targets.D) Proportion of D-I pairs with genetic support, P(G) (x axis), as a function of each D-I pair's phase reached (inner y axis grouping) and the drug's highest phase reached for any indication (outer y axis grouping).The center is the exact proportion and bars are Wilson 95% confidence intervals.The N is indicated