Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology

Tumor molecular profiling of single gene-variant (‘first-order’) genomic alterations informs potential therapeutic approaches. Interactions between such first-order events and global molecular features (for example, mutational signatures) are increasingly associated with clinical outcomes, but these ‘second-order’ alterations are not yet accounted for in clinical interpretation algorithms and knowledge bases. We introduce the Molecular Oncology Almanac (MOAlmanac), a paired clinical interpretation algorithm and knowledge base to enable integrative interpretation of multimodal genomic data for point-of-care decision making and translational-hypothesis generation. We benchmarked MOAlmanac to a first-order interpretation method across multiple retrospective cohorts and observed an increased number of clinical hypotheses from evaluation of molecular features and profile-to-cell line matchmaking. When applied to a prospective precision oncology trial cohort, MOAlmanac nominated a median of two therapies per patient and identified therapeutic strategies administered in 47% of patients. Overall, we present an open-source computational method for integrative clinical interpretation of individualized molecular profiles. Van Allen and colleagues develop a data-integration framework with an underlying knowledge base supporting clinical decision making and also serving as a hypothesis-generating platform, which the authors benchmark and validate across several retrospective cohorts and a prospective precision oncology trial.

T argeted panels or whole-exome sequencing (WES) now routinely inform the clinical care of oncology patients 1 . The resulting collections of patient-specific cancer genome alterations are valuable resources in the advancement of precision medicine. However, the growing quantity and complexity of potentially actionable genomic alterations available for each patient limit the ability of any individual clinician or researcher to interpret them. This challenge necessitated the creation of clinical interpretation algorithms to computationally prioritize large sets of patient-specific alterations by clinical and biological relevance, as well as exposed the need to pair these interpretation algorithms with up-to-date knowledge bases that link molecular alterations to relevant clinical actions.
Clinical decision making in precision oncology commonly emphasizes 'first-order' relationships (pairing individual somatic variants, copy number alterations, pathogenic germline variants, or fusions with specific clinical actions such as use of inhibitors of BRAF p.V600E and kinases RAF and/or MEK) based on approvals from the Food and Drug Administration (FDA) and other clinical evidence [2][3][4][5][6][7] . While these efforts have been highly fruitful, they also have certain limitations. Many academic and commercially available targeted panels focus primarily on somatic variants and copy number alterations; often, they do not sequence associated germline tissue or comprehensively assess fusions 1 . Yet pathogenic germline variants impact cancer risk and can also modify clinical interpretation of secondary somatic events in the same gene or that of genome-wide mutational signatures (for example, DNA repair) 8,9 . Similarly, the approval of inhibitors of TRK for patients with any solid tumor harboring NTRK fusions and other biological insights gained from somatic variants that can be identified from RNA may warrant expanding routine clinical sequencing to jointly evaluate a patient's genomic and transcriptional data 10,11 . In addition, the ongoing characterization of the cancer genome has revealed the importance of considering these first-order events in tandem as well as 'second-order' molecular features, genomic processes such as microsatellite instability and tumor mutational burden (TMB) Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology that are global rather than limited to individual gene(s). Such processes have also been associated with clinical phenotypes, such as signature 6 from the Catalogue of Somatic Mutations in Cancer (COSMIC) correlating with mismatch repair deficiency and microsatellite instability linked to cancer immunotherapy response 12 . Lastly, even with the consideration of these additional features and second-order relationships, some patients may be variant negative and thus may not qualify for genomically guided treatment. To address this challenge, multiple efforts have demonstrated that cancer cell lines can also inform treatment selection, but such approaches are constrained, both by the limited molecular diversity of cancer cell lines and computational difficulty in matchmaking, to identify which models are most representative of an individual patient's tumor [13][14][15][16][17] .
To maximize interpretability of integrative molecular profiling for point-of-care treatment decision making and translational-hypothesis generation, new methodologies are needed to leverage both first-order and second-order molecular alterations, relationships between multiple co-occurring events, and the full spectrum of both clinical and preclinical evidence. Here, we introduce MOAlmanac, a clinical interpretation algorithm paired with an alteration-action database ( Fig. 1) that operates on germline, somatic and transcriptional data in tandem from individual patients. MOAlmanac expands the scope of considered molecular alterations beyond somatic variants and copy number alterations to include fusions, germline variants, and concordance between events across feature types. In addition, MOAlmanac considers global 'second-order' molecular features and introduces a profile-to-cell line matchmaking module to leverage cell line profiling to nominate additional genomic features potentially associated with therapeutic sensitivity. MOAlmanac is provided in a cloud-based framework and delivers reports at the level of the individual patient. By integrating diverse data sources with higher-order interpretation, MOAlmanac expands the landscape of clinical actionability to facilitate point-of-care decision making and to advance precision cancer medicine.

Results
Developing an integrated interpretation framework. MOAlmanac is a clinical interpretation method that evaluates individual patient  [18][19][20][21][22][23] . Next, they are further prioritized based on associations between specific alterations and each data source. For instance, GNAS p.R201H will rank higher than PRDM14 p.F204V because, although both genes and protein changes exist in Cancer Hotspots, GNAS is a CGC gene while PRDM14 is not and neither are reported in 3D Cancer Hotspots.
The clinical relevance of each cancer-associated molecular feature is further assessed based on an underlying custom knowledge base that contains 790 assertions relating molecular features to therapeutic sensitivity, resistance, and prognosis based on published literature and guidelines across 58 cancer types. This resource evolved from our prior actionability database (Tumor Alterations Relevant for Genomics-driven Therapy (TARGET)), which represented entries as genes and data types 2 (Fig. 1b, Methods, and Supplementary Table 2). By contrast, MOAlmanac defines molecular features broadly to encompass varying types of alterations backed by cited evidence. For example, MOAlmanac is capable of recording information regarding specific singleton features (for example, BRAF p.V600E) but also more general event classes (such as the presence of an ALK fusion without regard to the fusion partner). Relationships between molecular features and treatment response are annotated for targeted therapies (472 assertions), immunotherapies (50), chemotherapies (43), radiation therapy (15), hormonal treatments (9) and combination therapies (17) (Fig. 1c and Methods). Individual genomic events that match cataloged features are labeled by the specificity of the underlying event and match completeness (Extended Data Fig. 1   missense variant matching to a cataloged ATM nonsense variant; and events for which the gene appears in the database under a different data type are highlighted as 'biologically relevant' but are not associated with a clinical assertion, for example, a CDKN2A somatic variant matching to CDKN2A copy number deletions. These assertions are derived from numerous evidence sources in accordance with existing frameworks [3][4][5]24 , including FDA approvals (FDA approved), clinical guidelines (guideline), results from prospective clinical trials (clinical trial), results from human studies other than a clinical trial (clinical evidence), findings from cancer cell lines or animal models (preclinical), or inferences from mathematical models or associations between molecular features (inferential) ( Fig. 1c and Methods). MOAlmanac also characterizes individual features in concert with each other and second-order genomic events. For each MOAlmanac gene, events across all feature types are reported together to elucidate contributions from distinct types of genomic events. Somatic variants in a given gene will increase in priority if either a truncating or a pathogenic or likely pathogenic (according to ClinVar) germline variant appears in the same gene or if the somatic variant is observed with sufficient power in validation sequencing, if provided 24,25 . Both COSMIC mutational signature contributions and TMB are calculated and variants related to microsatellite instability are highlighted. Tumor ontology is mapped with OncoTree. Tumor purity, ploidy, whole-genome doubling, and microsatellite-stability status are also accepted for reporting and evaluation. All nominated clinical associations are reported in a web-based actionability report (Methods).
Expanded clinical actionability in retrospective cohorts. We first evaluated MOAlmanac relative to our prior established WES first-order interpretation framework (Precision Heuristics for Interpreting the Alteration Landscape (PHIAL) with TARGET), which considers somatic variants and copy number alterations 2 . WES and RNA sequencing (RNA-seq) data were acquired for 110 previously published patients with metastatic melanoma (n = 44 with RNA) 26 , 150 patients with metastatic castration-resistant prostate cancer (mCRPC, n = 149 with RNA) 27 , 100 patients with primary kidney papillary renal cell carcinoma (KIRP, n = 100 with RNA) 28 , and 59 pediatric patients with osteosarcoma (OS, n = 34 with RNA) 29 . These cohorts and tumor types were chosen to represent a wide range of putative actionability landscapes. All profiles were analyzed to call somatic variants, germline variants, and copy number alterations from WES data and somatic variants and fusions from RNA-seq data (Methods).
We compared how often the two methods observed a clinically relevant event associated with therapeutic sensitivity, resistance, or prognosis when only somatic variants and copy number alterations were considered (Fig. 2a,c and Supplementary Table 3). Furthermore, we characterized only well-established relationships by restricting our analysis to assertions curated from FDA approvals, clinical guidelines, clinical trials, or clinical evidence. MOAlmanac identified 412 such putatively actionable events from 253 patients (73 with melanoma, 118 with mCRPC, 37 with KIRP, and 25 with OS), 227 (55.1%) of which were flagged by PHIAL for clinical relevance. For example, the most commonly flagged features were BRAF p.V600E (39 patients) for metastatic melanomas, AR amplifications (82 patients) in mCRPC, MET amplifications (18 patients) in KIRP and RB1 deletions (12 patients) in OS. When 'investigate actionability' variants were included, an additional 93 patients (22.2% of the cohort) harbored a potentially clinically relevant variant, such as NRAS p.Q61K (10 patients with melanoma) with associated sensitivity to selumetinib, 43 of which were also highlighted by PHIAL. PHIAL identified two events as 'putatively actionable' and 186 events as 'investigate actionability' , which were not highlighted by MOAlmanac; however, all genes associated with these events were not migrated to MOAlmanac from TARGET for reasons such as insufficient evidence of clinical relevance (Methods).
Next, while still limiting our analysis to somatic variants and copy number alterations, we investigated how the inclusion of preclinical and inferential evidence sources affected identification of potentially actionable results. On the basis of preclinical evidence, 164 such genomic events from 140 patients were identified (for example, PTEN deletions and sensitivity to everolimus or AZD8186), 91 (55.49%) of which were also highlighted by PHIAL. Inferential evidence highlighted 24 additional putatively actionable copy number alterations from 24 patients, most prominently CCND1 amplifications for reported sensitivity to palbociclib (n = 15). Thus, using all cataloged evidence, MOAlmanac noted 1,445 somatic variants and copy number alterations as 'putatively actionable' or 'investigate actionability' across 365 patients (109 with melanoma, 142 with mCRPC, 72 with KIRP, 42 with OS). Of these events, PHIAL highlighted 79 (5.5%) as 'putatively actionable' , 374 (25.9%) as 'investigate actionability' , and 390 (27%) as 'biologically relevant' (Fig. 3).
We then evaluated whether an expanded set of molecular features (including germline variants and fusions as additional first-order features and TMB, mutational signatures, and aneuploidy as second-order features, none of which are handled by PHIAL) could further broaden the actionability landscape for individual patients (Fig. 2b,d). Of patients who harbored alterations of such feature types, the median number of additional features observed was 1 (minimum, 1; maximum, 23). Pathogenic and likely pathogenic germline variants highlighted 13 additional clinically relevant molecular features across 13 different samples (zero for melanoma, ten for mCRPC, two for KIRP, one for OS), seven of which were BRCA1 and/or BRCA2 variants. MOAlmanac identified 137 clinically relevant fusions across 91 patients; ten mCRPC tumors harbored no putatively actionable somatic variants or copy number alterations but did contain TMPRSS2-ERG. Regarding second-order molecular features, elevated TMB was noted for 44 patients with metastatic melanoma and four patients with mCRPC (Methods); clinically relevant mutational signatures were observed in 116 molecular profiles; and whole-genome doubling, which has been associated with poor prognosis, was observed in 180 profiles 30 . In some of these cases, combinations of these features were particularly relevant when present in tandem. For example, a pathogenic PHIAL and TARGET and MOAlmanac evaluate MEL (n = 110), mCRPC (n = 150), KIRP (n = 100) and OS (n = 59) BRCA2 variant, p.S1882*, was observed in one patient along with a 39% mutational signature attribution to COSMIC signature 3, both of which may suggest homologous-recombination repair deficiency and sensitivity to poly(ADP-ribose) polymerase (PARP) inhibition [31][32][33] . By considering these feature types, MOAlmanac identified an additional 557 clinically relevant molecular features in 329 patients, resulting in 395 patients with at least one event associated with therapeutic sensitivity, resistance, or prognosis ( Fig. 3).
In total, MOAlmanac found at least one clinically relevant feature for 100% of evaluated patients with metastatic melanoma, 99.3% of those with mCRPC, 85% of those with KIRP and 86.4% of those with OS, using evidence ranging from FDA approvals to inferential relationships and both first-order and second-order molecular features. By comparison, PHIAL identified such somatic variants and copy number alterations in 91.8% of patients with metastatic melanoma, 87.3% of those with mCRPC, 27% of those with KIRP and 61% of those with OS ( Fig. 4a). Thus, the inclusion of additional feature types and evidence for clinical interpretation provided patients with an expanded set of clinical hypotheses.
Focusing specifically on therapeutic sensitivity, additional evidence sources provided otherwise variant-negative patients with clinical hypotheses (Fig. 4b). FDA-approved or clinical-guideline associations resulted in a highlighted therapy for 235 of 419 patients (79 with melanoma, 109 with mCRPC, 36 with KIRP, and 11 with OS); 16 patients obtained a therapeutic hypothesis from feature types other than somatic variants and copy number alterations, such as pathogenic BRCA2 germline variants (two patients) or NTRK fusions (one patient). Inclusion of preclinical evidence provided 68 otherwise variant-negative patients with a therapeutic hypothesis and an additional 28 patients due to inferential evidence, for example, CDKN2A and/or CDKN2B deletions and sensitivity to EPZ015666 (12 patients).
Leveraging preclinical models for clinical actionability. We next investigated whether preclinical data from high-throughput therapeutic screens of cancer cell lines could further inform clinical interpretation within the MOAlmanac methodology. We identified 452 solid tumor cell lines from the Cancer Cell Line Encyclopedia and Sanger Institute's Genomics of Drug Sensitivity in Cancer (GDSC) that had available data on nucleotide variants, copy number alterations, fusions, and drug sensitivity (Methods) 34,35 . Of MOAlmanac's 137 cataloged therapies, 44 were represented in the current GDSC2 dataset, and 15 additional therapies were represented only in the older GDSC1 dataset. These 59 therapies are involved in 274 cataloged assertions between genomic alterations and therapeutic sensitivity, for each MOAlmanac evaluates sensitivity for wild-type cell lines versus those harboring the corresponding or related alterations. For example, in the case of the cataloged preclinical relationship between PIK3CA p.H1047R and sensitivity to pictilisib, MOAlmanac reports sensitivity for wild-type cell lines versus those harboring any genomic alteration in PIK3CA, any nonsynonymous variant in PIK3CA, any missense variant in the gene, and those specifically with the p.H1047R variant (Extended Data Fig. 2). Across all evaluable relationships asserting sensitivity, 18 therapies showed a significant difference in the half-maximum inhibitory concentration (IC 50 ) between wild-type and mutant cell lines (Supplementary Table 4 and Methods). Thus, high-throughput therapeutic screens of cancer cell lines are used as an orthogonal axis of evidence to evaluate clinically relevant relationships nominated by MOAlmanac.
The above approach simplistically compares sensitivity between cell lines that do or do not share a single specific molecular feature. A potential limitation of this approach is that it includes cell lines that share the index feature but are otherwise genomically highly dissimilar, and therefore their overall biological relevance to the underlying patient sample may be questionable. Therefore, we were motivated to identify cancer cell lines that shared more extensive similarities in their molecular profiles and investigate whether such 'profile-to-cell line matchmaking' could identify additional potential therapeutic sensitivities. Previous approaches have evaluated genomic similarity based on shared mutated genes that are weighted by their recurrence in The Cancer Genome Atlas (TCGA) 15,16 ; however, we chose to assess models based on shared therapeutic sensitivity independent of histology-specific priors. We evaluated several models on cell lines using a hold-one-out approach (Methods). For each cell line, we determined whether its nearest neighbor shared drug sensitivity to any GDSC therapy ( Fig. 5a and Methods). Similarity network fusion applied to nucleotide variants, copy number alterations, and rearrangements involving CGC genes and genomic alterations associated with FDA approvals most frequently assigned a nearest neighbor that shared drug sensitivity (19.1%, Fig. 5b and Methods) 36 . A cell line harboring at least one alteration associated with an FDA approval resulted in that feature(s) being shared with the nearest neighbor in 75% of cases (154 of 205). When considering all evaluated cell lines (n = 377), profiles shared 22.5% of CGC genes altered, primarily driven by copy number alterations (median, 24.2%; minimum, 0%; maximum, 85.7%), followed by somatic variants (median, 18.2%; minimum, 0%; maximum, 59.1%), and then rearrangements (median, 0%; minimum, 0%; maximum, 100%) (Extended Data Fig. 3

and Methods).
This profile-to-cell line matchmaking module was then applied to our previously characterized patient cohorts (Fig. 5c). Within the mCRPC cohort, the most common nearest-neighbor cell line among the 452 tested cell lines was VCaP, one of two prostate cancer cell lines, for 25 of 150 patients. Nearest-neighbor cell lines to patients with metastatic melanoma were frequently sensitive to MEK and RAF inhibitors, including SB590885, dabrafenib, and PLX-4720 (vemurafenib, Fig. 5c). Although the most common nearest neighbor was a liver-derived cancer cell line and was not skin derived (SKHEP1), it harbored a BRAF p.V600E somatic variant. Furthermore, the nearest neighbor of 26 of 110 melanoma profiles was a skin-derived cell line, and 36 of 39 profiles that were BRAF p.V600E mutants shared this event with their nearest neighbor. The method reports sensitive therapies for all genomically similar cell lines.
Integrated clinical interpretation of a prospective trial. We lastly compared therapeutic strategies nominated by the complete MOAlmanac methodology with those administered to 83 patients in Investigation of Profile-Related Evidence Determining Individualized Cancer Therapy (I-PREDICT, NCT02534675), a prospective clinical trial evaluating personalized therapies based on panel sequencing (Foundation Medicine's FoundationOne) 37 . Citations and relationships between molecular features and clinical action from the study were reviewed and categorized by MOAlmanac evidence levels (Supplementary Table 5). MOAlmanac processed the 524 molecular features reported for I-PREDICT's 83 patients on a per-patient basis. Therapies administered in the study (45 unique therapies) or highlighted by our method (40 therapies) were categorized by therapeutic strategy according to expert review based on shared pathway targets, resulting in a total of 33 unique strategies (Supplementary Table 5). An overlap in recommended therapeutic strategy was observed in 39 (47%) patients (Fig. 6a), 31 of which involved a therapy most prioritized for the patient by MOAlmanac. For patient-therapy pairs highlighted by MOAlmanac based on FDA evidence or clinical guidelines, 60% were involved in a therapeutic strategy administered by the study. Of the ten patients with a therapy highlighted by MOAlmanac associated with 'FDA approved' or 'guideline evidence' that were not involved in an overlapping strategy, one patient had another therapy that used a strategy administered by I-PREDICT and the remaining nine nominated therapies are approved for other disease contexts. For nominations based on weaker evidence categories, the concordance was 18% for preclinical evidence and 50% for inferential evidence (Fig. 6b). The most common concordant strategies were estrogen receptor (ER) signaling, PI3K-AKT-mTOR, and PD-1-PD-L1 inhibition (nine, nine and eight patients, respectively). Of strategies that were not shared, I-PREDICT favored vascular endothelial growth factor (VEGF) inhibition for patients with TP53 alterations (18 patients), whereas MOAlmanac frequently highlighted assertions such as protein arginine methyltransferase (PRMT5) inhibition (13 patients) based on a preclinical relationship showing efficacy of EPZ015666 for CDKN2A and/or CDKN2B deletions (Fig. 6c). Finally, using our profile-to-cell line matchmaking module, we showed that nearest-neighbor cell lines were sensitive to a median of two therapies. For example, I-PREDICT administered everolimus and MOAlmanac highlighted AZD8186 and pictilisib in the case of study ID 105, a 60-year-old female with breast cancer. The We investigated whether MOAlmanac could highlight similar therapeutic strategies that were used by real-world evidence. MOAlmanac was applied to the I-PREDICT trial, which evaluated efficacy of molecularly matched therapies in 83 patients. Therapies and corresponding molecular features were mapped to therapeutic strategies for those administered in I-PREDICT and highlighted by MOAlmanac. a, A shared therapeutic strategy was observed in 39 (47%) patients, 31 of whom involved a therapy most prioritized for the patient by MOAlmanac. b, MOAlmanac nominated therapeutic strategies applied for a given patient more often for those based on well-established evidence (that is, FDA approvals; 60% of therapy-patient pairs) relative to less-established evidence, such as preclinical evidence (18%). c, Therapeutic strategies, individual therapies, and genes and molecular features as administered or targeted by I-PREDICT and highlighted by MOAlmanac. TMB-Int, tumor mutational burden intermediate. Data are available as source data.
nearest-neighbor cell line CAL-29 (bladder carcinoma) was sensitive to taselisib and alpelisib as reported by GDSC2, both of which also target PI3K-AKT-mTOR. In another case, I-PREDICT administered lenvatinib and ramucirumab for VEGF-VEGF receptor (R) inhibition to study ID A009, a 44-year-old male with esophageal adenocarcinoma. MOAlmanac highlighted infigratinib for fibroblast growth factor receptor (FGFR) inhibition for therapeutic sensitivity, and the nearest-neighbor cancer cell line A204 (soft tissue) was sensitive to both VEGF and FGFR inhibition (VEGF, cediranib, linifanib, motseanib, ponatinib and tivozanib; and FGFR, ponatinib). Thus, MOAlmanac recapitulates established decision-making paradigms in a prospective pan-cancer setting and extends potential assertions in new therapeutic directions in other settings.

Discussion
Here, we present a clinical interpretation method paired with a new knowledge base to facilitate decision making in precision oncology. In addition to first-order feature consideration, MOAlmanac considers second-order molecular features such as mutational signatures, TMB, microsatellite stability, and ploidy, as well as high-throughput therapeutic screens of cancer cell lines. In sum, MOAlmanac addresses two key needs for precision cancer medicine: (1) point-of-care individualized patient treatment considerations based on complex molecular interactions that consider evidence beyond FDA approvals and clinical guidelines and (2) new therapeutic hypotheses based on integrative interpretations that can be evaluated in preclinical follow-up and prospective trials. When applied to retrospective cohorts, we observed that these new features of MOAlmanac (assessment of second-order genomic features and consideration of preclinical or inferential evidence) provided additional hypotheses for prognosis and therapeutic sensitivity and resistance, especially for otherwise variant-negative tumors. MOAlmanac enables rapid contextualization of clinically relevant molecular features by associating them with assertions and cited evidence based on match to underlying genomic evidence.
While individual precision oncology studies require fixed versions of alteration-action knowledge bases, the rapidly expanding scope of literature on which these databases originate requires constant updating, which makes prospective assessment of precision oncology programs difficult. This challenge was evident when comparing MOAlmanac to the I-PREDICT trial, as differences in match selection were driven by differences in therapeutic evidence and approvals at different time points, variable knowledge capture of the vast precision oncology hypothesis landscape, and levels of evidence to justify treatment selection. These results are suggestive of the urgency to standardize genomic-based clinical trial data and aggregate knowledge bases to parse the vast literature in precision oncology and enable principled, evidence-based clinical care 5,38 . Manual curation of literature is inherently laborious, and prior efforts have encouraged crowdsourcing and meta-studies to address this challenge 4,5,39 .
Furthermore, there were areas of note that could specifically improve our evaluation of profile-to-cell line matchmaking for translational-hypothesis generation. First, not all cell lines were tested with every therapy; if they were, the shared drug response could be characterized in a more nuanced manner than the current boolean status. Second, there is likely an opportunity to develop improved genomic similarity models that align with therapeutic sensitivity. The advent of large, clinically annotated and molecular-profiled patient cohorts may enable these techniques and patient-similarity networks to be evaluated for precision cancer medicine on patient profiles rather than cancer cell lines 1,40,41 . Indeed, our primary motivation is to develop similarity metrics that account for multiple data types from tumors to properly leverage nearest-neighbor approaches. These approaches, which prospectively leverage genomic data rather than retrospectively curated data sources, are imperative to develop therapeutic hypotheses for patients who are variant negative.
In conclusion, MOAlmanac catalyzes the use of expanded feature types, evidence sources, and algorithms for clinical interpretation of integrative molecular features for precision cancer medicine applications. Incorporation of MOAlmanac into future translational studies and clinical trials may directly enable evaluation of the precision oncology hypothesis across patient populations. Furthermore, MOAlmanac can promote evaluation of patient-similarity networks using both clinical and preclinical knowledge to aid precision cancer medicine at the individual patient level for translational discovery. MOAlmanac is available at https://moalmanac.org. This method is available on GitHub (https://github.com/vanallenlab/ moalmanac), Docker Hub (https://hub.docker.com/r/vanallenlab/ moalmanac), and on the Broad Institute's Terra (https://portal. firecloud.org/#methods/vanallenlab/moalmanac/7). In addition, a web portal to process individual cases through a user interface atop Terra is available at https://portal.moalmanac.org/. All code related to analyses and figures in this study can be found on GitHub (https://github.com/vanallenlab/moalmanac-paper). Finally, to facilitate crowdsourced updating of MOAlmanac's knowledge base, MOAlmanac Connector (a Google Chrome extension) is available to enable users to nominate relationships with minimal effort.

Methods
Iterating from TARGET. TARGET cataloged clinical assertions primarily by gene associated with types of recurrent alterations and examples of therapeutic agents paired with an aggregate rationale for the gene. Literature review was performed by curators to review FDA approvals, clinical guidelines, and journal articles to associate clinical assertions from TARGET with a citation. Of the 121 genes cataloged, 59 genes were retained and migrated to MOAlmanac if a citation could be found for at least one rationale and feature type associated with the gene. Of the 62 genes that were not cataloged, supporting citations could not be found for 51, eight were diagnostic assertions that are not cataloged by MOAlmanac, two suggested the presence of a germline variant (an assertion type not cataloged by MOAlmanac), and one was not included due to conflicting evidence. The assertion not migrated due to conflicting evidence was that MTOR activating mutations predict sensitivity to mTOR inhibitors. TARGET data were obtained as supplementary table 7 from Van Allen et al. 2 and annotated with the aforementioned categorizations (Supplementary Table 2).
Cataloging additional assertions. Subsequent curation efforts cataloged FDA approvals, clinical guidelines, conference abstracts, or recently published literature. Relationships were categorized by the clinical implication of the assertion (therapeutic sensitivity or resistance or prognosis), therapy type (if relevant), and evidence. Genomic feature types considered were somatic and germline variants, copy number alterations, rearrangements, mutational burden, COSMIC mutational signatures (version 2), microsatellite-stability status, and aneuploidy.
The knowledge base contained 790 assertions that relate molecular features to therapeutic response and prognosis and four related to adverse-event risk, manually curated from literature review of FDA approvals (155 assertions), clinical guidelines (188), published journal articles (442) and abstracts (five). In addition to characterizing targeted therapies (472 assertions), we have cataloged relationships related to immunotherapies (50), chemotherapies (43), radiation (15), hormonal treatments (nine) and combination therapies (17; Fig. 1c). MOAlmanac catalogs both positive and negative studies and currently contains 13 assertions asserting that a molecular feature does not correlate with therapeutic sensitivity and 92 assertions associated with unfavorable prognosis.
No further assertions were added to MOAlmanac past 4 February 2021 for the purposes of this study (database release version 2021-02-04).
Comparison to other knowledge bases. MOAlmanac was categorically compared to CIViC and OncoKB (both accessed 4 February 2021), two similar precision oncology knowledge bases, across the categories of therapy types, molecular feature types, assertion types, cataloged evidence, curation type, accessibility, number of assertions, and counted therapy types (Supplementary Table 6). Citations with PubMed reference numbers (PMIDs), therapies, and genes cataloged were compared, and we observed findings similar to those of previous meta-studies, in that no one database subsumed another (Extended Data Fig. 4) 39 .
Developing a clinical interpretation method. MOAlmanac accepts any combination of somatic variants, copy number alterations, rearrangements, germline variants, somatic variants from secondary (such as validation or orthogonal) sequencing, and breadth of coverage as inputs. MOAlmanac considers individual nonsynonymous variants (missense, nonsense, nonstop and frameshift mutations, insertions and deletions), copy number alterations that are outside of 1.96 standard deviations from the mean of unique segment means (above 97.5% for amplifications and below 2.5% for deletions), and at least five spanning fragments for fusions. Several single-value or boolean features are accepted such as purity and ploidy of the tumor as float values, a categorical input for microsatellite-stability status, and a boolean for whole-genome doubling. Provided tumor types are mapped to standardized ontology terms and codes using OncoTree 42 .
Somatic variants, copy number alterations, and gene fusions are annotated with and sorted based on their presence in the following databases in the following order: MOAlmanac, Cancer Hotspots, 3D Hotspots, the CGC, the MSigDB, and COSMIC (Fig. 1d) 18,19,[21][22][23] . Germline variants in genes noted by the American College of Medical Genetics and Genomics version 2, related to hereditary cancers, or related to somatic cancers (based on gene match to MOAlmanac, Cancer Hotspots, or the CGC) are highlighted (Fig. 1e) 18,21,43 . Somatic and germline variants are also annotated with ClinVar to identify pathogenic or likely pathogenic variants and with ExAC to identify common variants, defined as an allele frequency greater than or equal to 1 in 1,000 alleles 24,25 .
Clinically relevant associations are solely made based on a molecular feature's match to MOAlmanac, labeled based on the match to the cataloged molecular feature and evidence of the matched relationship (Extended Data Fig. 1). Complete matches to explicit features (for example, protein change for variants, direction for copy number alterations, or both involved genes for fusions) are labeled as 'putatively actionable' , whereas partial matches or incompletely characterized features (the gene is cataloged of that data type; for example, an ETV6-NTRK1 fusion matches to an assertion of NTRK1 fusions) are labeled as 'investigate actionability' . If an alteration's gene appears in MOAlmanac but is not cataloged as the same data type, the alteration will be labeled as 'biologically relevant' and is not associated with any clinical relationships. For each provided genomic feature, a match for each type of assertion (therapeutic sensitivity, resistance, and disease prognosis) is independently searched for. If the genomic match is either labeled as 'putatively actionable' or 'investigate actionability' , then the evidence level of the association, therapy name and therapy type or favorable prognosis, relationship description, citation, and URL for the citation are associated. MOAlmanac will first attempt to match to assertions of the same tumor ontology and, if unsuccessful, will match to assertions in an ontology-agnostic manner. Associations to cataloged assertions are determined by a molecular feature's match to MOAlmanac.
If somatic SNVs are provided for both primary and secondary sequencing, MOAlmanac will annotate variants called in primary sequencing based on their presence (allelic fraction and coverage) in the secondary sequencing. The power to detect variants in secondary sequencing is calculated using a β-binomial distribution with k equal to 3 for a minimum of three reads, n as coverage of the variant in secondary sequencing, α and β defined as the alternate and reference read counts +1 as observed from primary sequencing, respectively. This approach is consistent with best practices by Yizhak et al. 11 with RNA MuTect 11 . Variants observed with detection power greater than or equal to the specified minimum (default, 0.95) are noted. MOAlmanac only leverages secondary sequencing for validation and does not use it for discovery. When applied to the retrospective cohorts of metastatic melanoma and mCRPC, we had sufficient power to observe 223 of 553 applicable clinically relevant variants.
MOAlmanac additionally performs annotation and evaluation of integrative and second-order genomic features. Somatic, germline, copy number, and fusion events per gene for genes found in MOAlmanac, Cancer Hotspots, and the CGC are summarized to highlight intra-gene variation. Somatic alterations are annotated with the number of frameshift, nonstop, nonsense, or splice-site germline events within the same gene. TMB is calculated based on the number of nonsynonymous variants divided by the somatic calculable bases. TMB is compared to values calculated for TCGA molecular profiles by Lawrence et al. 44 to yield a pancan percentile and a tissue-specific percentile, if ontology matched to one of the 27 tumor types studied in the publication 44 . TMB for a molecular profile is designated as high if it is greater than ten nonsynonymous variants per megabase and greater than or equal to the 80th tissue-specific percentile or pancan percentile if not mapped. COSMIC mutational signatures (version 2) are evaluated using deconstructSigs by running R as a subprocess using the default trinucleotide counts method 45,46 . Signatures with a contribution greater than a specified minimum contribution (default, 0.20) are annotated at least as 'biologically relevant' and annotated using MOAlmanac for consideration of actionability. Microsatellite stability is considered both directly as a categorical input for status and indirectly by highlighting potentially related variants. As a direct input, users may flag microsatellite status as microsatellite stable, microsatellite instability low, microsatellite instability high, or unknown. Genomic alterations that appear in genes related to microsatellite instability are highlighted as supporting variants and 'biologically relevant'; specifically, the genes considered are ACVR2A, DOCK3, ESRP1, JAK1, MLH1, MSH2, MSH3, MSH6, PMS2, POLE, POLE2, PRMD2, and RNF43 (refs. 47,48 ). Whole-genome doubling, or aneuploidy, is considered as a boolean to evaluate clinical relevance as being associated with adverse survival across a pan-cancer setting 30 . Mutational burden, mutational signatures, microsatellite stability, and whole-genome doubling are at most highlighted as 'investigate actionability' by MOAlmanac for clinical assessment.
Clinical actionability reports are created for all profiles processed with MOAlmanac and generated with Python 3.6, Flask, and Frozen Flask. Because they were produced with Frozen Flask, these web-based reports are a single HTML file with no additional file dependencies; they usually are no larger than 1 Mb in size. An example report is available on our website (https://portal.moalmanac.org/ example).
Supplementary Table 1 contains vignettes for each feature type, showcasing example features with a rationale explaining why they matched to data sources as they did. A full specification of MOAlmanac is available on GitHub (https://github. com/vanallenlab/moalmanac).
Comparing PHIAL-TARGET and MOAlmanac with four retrospective studies. WES and RNA-seq data were acquired for 110 previously published patients with metastatic melanomas (n = 44 with RNA) 26 , 150 patients with metastatic castration-resistant prostate cancers (mCRPC, n = 149 with RNA) 27 , 100 patients with papillary renal cell carcinoma (KIRP, n = 100 with RNA) 28 , and 59 pediatric patients with OS (n = 34 with RNA) 29 . Subsequent sample processing was performed on Terra.
WES was used to call somatic and germline variants and copy number alterations. WES data were aligned to the b37 hg19 reference genome using BWA version 0.5.9, following the Broad Institute's Picard best practices (https://software. broadinstitute.org/gatk/best-practices/, https://broadinstitute.github.io/picard/). MuTect 1.1.6 was used to identify SNVs and somatic calculable bases of individual tumor samples, while Strelka version 1.0.11 was used to identify insertions and deletions (indels) 49,50 , run using the Getz laboratory CGA WES characterization pipeline at the Broad Institute. Germline variants were called using DeepVariant version 0.6.0 (ref. 51 ). Segmented total copy number was calculated across the exome by comparing fractional exome coverage to a panel of normal samples using CapSeg as implemented in GATK 3.7 (refs. 52,53 ). Tumor purity and ploidy were calculated using FACETS version 0.5.14 (ref. 54 ).
Somatic variants from both WES and RNA-seq data, germline variants, and copy number alterations were annotated using Oncotator version 1.9.1 (ref. 57 ).
Molecular features were processed for all 419 profiles by both PHIAL 1.0.0 (https://github.com/vanallenlab/phial) and MOAlmanac 0.4.1 (https://github. com/vanallenlab/moalmanac) 2 . PHIAL considered somatic variants and copy number alterations, while MOAlmanac additionally considered germline variants, rearrangements, mutational burden, mutational signatures, and whole-genome doubling. Microsatellite stability was not considered for this analysis, as labels from testing, if performed, were not available. Events that matched with the underlying knowledge base as either 'investigate actionability' or 'putatively actionable' , thus stronger than simply a gene match, were considered for clinical relevance (Fig.  3). While differences were impacted by literature curation and MOAlmanac considering additional feature types, they were also impacted by changing how copy number alterations were handled; PHIAL calls copy number alterations based on a threshold (|segment mean| ≥ 1), whereas MOAlmanac uses a percentile approach (top or bottom 2.5%). Counts of events identified as clinically relevant by MOAlmanac organized by cohort, feature type, and evidence are available in Supplementary Table 3 and are illustrated by assertion type in Extended Data Fig. 5.
Expanded methods for directly leveraging preclinical models. Somatic variants and copy number alterations for cancer cell lines cataloged in the Cancer Cell Line Encyclopedia were gathered from cBioPortal, and data for fusions and therapeutic sensitivity were downloaded from the Sanger Institute's GDSC 34,35 . Data for somatic variants, copy number alterations, and fusions were formatted for usage and annotated by MOAlmanac.
All GDSC1 and GDSC2 therapies were mapped to therapies cataloged in MOAlmanac. For all therapies associated with genomic events by MOAlmanac for which a GDSC mapping exists, a sensitivity dictionary is created in which each key is associated with a clinically relevant feature found by the method. For each feature, we list all mutant and wild-type cell lines for each component; for example, for CDKN2A deletions, mutant and wild-type lists are made for all cell lines that have any alteration in CDKN2A (somatic variant, copy number alteration, or fusion), cell lines that have a CDKN2A copy number alteration, and cell lines that have a CDKN2A deletion. For each pairing of mutant and wild-type cell lines, IC 50 values are compared with a two-sided Mann-Whitney-Wilcoxon test.
We sought to directly leverage molecular profiles for clinical interpretation by comparing a case molecular profile to a population and sort members by genomic features such that the nearest neighbor to the case profile shared drug sensitivity, referred to as profile-to-cell line matchmaking. The complete protocol is available on the Nature Protocol Exchange 58 . Briefly, a hold-one-out approach was applied to considered cancer cell lines to evaluate metrics of matchmaking. Molecular similarity models were assessed based on their ability to identify cancer cell lines that share therapeutic sensitivity using evaluation metrics from ranked retrieval (Supplementary Table 7).
Comparison to a prospective clinical trial, I-PREDICT. We compared clinical actions administered based on molecular profiles to patients in the I-PREDICT prospective clinical trial to those highlighted by MOAlmanac 37 . All genomic events considered were present in the supplementary text of the study, and we extracted molecular features, therapies administered, and citations. Disease ontologies were mapped to OncoTree 42 . Molecular features were formatted for annotation and evaluation by MOAlmanac.
Citations providing rationale for therapies administered based on molecular features were extracted from the supplementary text, obtained, read, commented on, and categorized by evidence level. Molecular features considered by the study were merged with annotations made by MOAlmanac, and, using author notes from the supplementary text, we annotated them if the study targeted the molecular feature. Therapy and associated molecular features were mapped to therapeutic strategies by expert review. Therapies administered in the study and those highlighted by MOAlmanac for therapeutic sensitivity were listed on a per-patient basis, and evidence levels were annotated for each therapy per patient. For therapies administered by the study, citations cited per patient were referenced to identify the specific relationship between therapeutic strategy, therapy, and molecular feature. Each therapy administered received a label based on the citation(s) cited by the study: the evidence tier associated with the citation, no citation (if the therapy was not administered based on molecular features), or the citation listed was not applicable (if the citation(s) listed did not mention the therapy, strategy, or target). In some cases that would have resulted in the latter, we transcribed that perhaps a source cited for another relationship in the cohort was intended to be cited and cited that source. Therapies were tagged with a boolean value if they were involved in a shared therapeutic strategy between what was administered in I-PREDICT and highlighted by MOAlmanac for a given patient (Supplementary Table 5).

Statistics and reproducibility.
No statistical method was used to predetermine sample sizes. Experiments were not randomized. Investigators were not blinded to allocation during experiments and outcome assessment. The present study is a retrospective study involving the application of new software to previously published data. Data exclusion occurred when preparing cohorts for the analysis of KIRPs and profile-to-cell line matchmaking. KIRPs were selected for analysis from the available 289 profiles on the basis of whether they contained both whole-exome and transcriptome sequencing data and were alphabetically present in the hosted Terra workspace to obtain 100 profiles. Cancer cell lines were excluded from analysis based on three criteria: (1) the availability of data for high-throughput drug screens, somatic variants, copy number alterations, and fusions, (2) (pre-existing) filtering to remove blood cancers, those subject to genetic drift or contaminated by fibroblasts and (3) (for evaluating profile-to-cell line matchmaking) requiring sensitivity to at least one therapy with at least one other cell line. These exclusion criteria were implemented to result in a cohort size comparable to that of the three other retrospective cohorts (n = 110, 150, and 59) and to confidently evaluate profile-to-cell line matchmaking using a hold-one-out approach. No further data were excluded from analyses.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.