Inflation of tumor mutation burden by tumor-only sequencing in under-represented groups

With the recent FDA approval of tumor mutational burden-high (TMB-H) status as a biomarker for treatment with a PD-1 inhibitor regardless of tumor type, accurate assessment of patient-specific TMB is more critical now more than ever. Using paired tumor and germline exome sequencing data from 701 patients newly diagnosed with multiple myeloma, including 575 self-reported White patients and 126 self-reported Black patients, we observed that compared to the gold standard of filtering germline variants with patient-paired germline sequencing data, TMB estimates were significantly higher in both Black and White patients when using public databases for filtering non-somatic mutations; however, TMB was more significantly inflated in Black patients compared to White patients. TMB as a biomarker for patient selection to receive immune checkpoint inhibitors (ICIs) therapy without patient-paired germline sequencing may introduce racial bias due to the under-representation of minority groups in public databases.

With the recent FDA approval of tumor mutational burden-high (TMB-H) status as a biomarker for treatment with a PD-1 inhibitor regardless of tumor type, accurate assessment of patient-specific TMB is more critical now more than ever. Using paired tumor and germline exome sequencing data from 701 patients newly diagnosed with multiple myeloma, including 575 self-reported White patients and 126 self-reported Black patients, we observed that compared to the gold standard of filtering germline variants with patient-paired germline sequencing data, TMB estimates were significantly higher in both Black and White patients when using public databases for filtering non-somatic mutations; however, TMB was more significantly inflated in Black patients compared to White patients. TMB as a biomarker for patient selection to receive immune checkpoint inhibitors (ICIs) therapy without patientpaired germline sequencing may introduce racial bias due to the under-representation of minority groups in public databases.
npj Precision Oncology (2021) 5:22 ; https://doi.org/10.1038/s41698-021-00164-5 Immune checkpoint inhibitors (ICIs) have dramatically improved the survival of patients with many types of cancer. Since the autoimmune toxicities with ICIs can be fatal, it is critical to optimize patient selection criteria. The current use of PD-L1 expression levels and mismatch-repair/microsatellite-instability status has limitations. Response to ICIs is predicated upon mutations that are translated into neoantigens that are presented by tumor cells and recognized by T cells that can eliminate tumor cells. Defective DNA repair leads to higher tumor mutational burden (TMB) which is defined as the total number of nonsynonymous mutations per megabase (Mb) of coding regions of a tumor genome, and is a surrogate for cancer neoantigens that can be recognized by the adaptive immune system. TMB was reported to predict survival after immunotherapy across multiple cancer types 1 . In June 2020, the United States Food and Drug Administration (FDA) approved the use of TMB-high (TMB-H) status as a patient selection criterion for treating adult and pediatric patients with unresectable or metastatic tumors with the PD-1 inhibitor pembrolizumab based on results from the phase 2 KEYNOTE-158 study 2 . Therefore, it is more critical now than ever to have accurate assessment of patient specific TMB.
Currently there is no globally accepted, standardized approach for TMB calculation. The most accurate TMB estimate requires patient-paired germline sequencing to filter out non-somatic variants 3 . However, since patient germline DNAs (e.g. peripheral blood) are not routinely collected in clinic for germline analysis, TMB is often calculated from tumor-only sequencing relying on public germline variant databases (DBs) to filter out non-somatic polymorphisms. We previously reported that filtering based on public DBs significantly inflated TMB 4 . In this study, we investigated the impact of minority group representation in these DBs and hypothesized that TMB would be more greatly inflated in under-represented groups.
The lack of representation of diverse ancestral backgrounds in genomic research, including individuals of African ancestry, is well known 5,6 . Of more than 60,000 individuals genotyped and sequenced, only 8.6% are of African ancestry while 54.9% are of non-Finnish European ancestry.

COMPARATIVE ESTIMATIONS OF TMB
The gold standard for identifying somatic mutations is to filter out non-somatic variants using patient-paired germline DNA sequencing data. TMBs estimated using this standard approach were comparable between tumors from Black and White patients (TMB of 6.09 ± 0.21, mean ± S.E, in Black patients; 5.47 ± 0.10 in White patients) (Table 1). However, when public variant DBs of 1000 Genomes Project (1000G) and Exome Aggregation Consortium (ExAC) were used to filter non-somatic variants, the TMB estimates were significantly inflated (Figs. 1a and 2, and Table 1) in tumors from both Black and White patients, with inverse correlations between TMBs and population minor allele frequency (MAF) threshold stringencies. In addition, the TMBs in the tumors from Black patients were inflated significantly higher compared to those in the tumors from White patients, with a race:filtering interaction p < 2e−16 by two-way ANOVA. When TMB was calculated from 1059 cancer-related genes only, similar observations were made (Fig. 1b). Importantly, while the TMBs across patients correlated well between values using different population MAF thresholds of the public DBs for non-somatic variant filtering (Fig. 2, row 2 columns 3-4, and row 3 column 4), the TMB from paired germline filtering had much lower correlations with TMB from public DB filtering of any threshold (Fig. 2, row 1, columns 2-4). This finding suggests that the impact of tumor-only sequencing on TMB estimates varied substantially from patient to patient.
In addition to 1000G and ExAC, ESP6500 (ref. 7 ) DB was also tested for variant filtering (Supplementary File 2 and Supplementary Fig. 3), which also resulted in more significantly inflated TMBs in Black compared to White patients.
TMB-H status is now an FDA-approved patient selection biomarker for ICI therapy. Because the collection of patientmatched germline samples is still not a common practice in clinic, TMBs are routinely estimated using tumor-only sequencing which led to significantly inflated TMB estimates 4 . Here we demonstrate that TMB inflations are racially disparate with significantly higher inflated TMBs in the tumors from Black patients due to the underrepresentation of minority groups in public variant DBs for variant filtering, regardless whether all (Figs. 1 and 2) or race-specific (Supplementary File 2 and Supplementary Fig. 2) variants from public DBs were used. If the currently approved TMB-H threshold of ≥10 mutations/Mb was hypothetically applied to the patients studied here, significantly higher numbers of Black patients would have been inappropriately selected to receive ICI therapy. It needs to be emphasized that we performed this proof-of-principle study in patients with multiple myeloma; however, the findings of racially disparate TMB inflation might be generalizable to all cancer types. Accurate TMB estimate is particularly important in cancers currently treated by ICIs including breast, bladder, cervical, colon, head and neck, liver, lung, renal cell, stomach, and rectal cancers, as well as Hodgkin lymphoma, melanoma, and any other solid tumor that is not able to repair errors during DNA replication (https://www.cancer.gov/about-cancer/treatment/types/ immunotherapy/checkpoint-inhibitors).
In addition, the inflated TMBs are likely relevant for other ethnic groups including Asians, Pacific Islanders, and other underrepresented groups.
Mutations are a surrogate for neoantigens. Not all mutations are expressed, presented by MHC proteins, and recognized by the adaptive immune system for elimination. Furthermore, frameshifts 8 or chromosomal rearrangements 9 may result in more potent neoantigens than single nucleotide substitutions. TMB is an appropriate step toward the application of immunotherapy; however, additional work that helps us to understand the quality of mutations rather than the quantity may refine this approach 10 . Just as mutations are a surrogate for neoantigens, self-reported race is a poor surrogate for geographical ancestry and individual polymorphisms. Even though race is a social construct that fails to encompass all the complexities of one's identity and social determinants of health, we felt race was important to investigate in the context of TMB given the use of population DBs for variant filtering.
Clinicians who rely on TMB calculated from tumor-only sequencing as a biomarker for patient selection to receive ICIs need to be aware of the potential for inflated TMB values, especially in patients who are under-represented in the public genetic variant DBs.

Variant calling
Sequencing reads of the tumor and patient-matched germline exomes were downloaded from the Sequencing Read Archive (SRA). The CoMMpass SM study provided the list of somatic mutations only. In order to examine the

Statistical testing
The TMB values were approximately normal (Supplementary File 2 and Supplementary Fig. 1). For each of the four filtering criteria, the comparisons of TMB between Black and White individuals were performed using Student's t-test. Two-way ANOVA model was used to measure the interaction between two independent variables: race (Black or White) and filtering criteria (four criteria as described above) (TMB~race + filtering + race:filtering).

DATA AVAILABILITY
The data generated and analyzed during this study are described in the following figshare data record