Analysis of tumor mutational burden: correlation of five large gene panels with whole exome sequencing

Outcome of immune checkpoint inhibition in cancer can be predicted by measuring PDL1 expression of tumor cells. Search for additional biomarkers led to tumor mutational burden (TMB) as surrogate marker for neoantigens presented. While TMB was previously determined via whole exome sequencing (WES), there have been approaches with comprehensive gene panels as well. We sequenced samples derived from formalin-fixed tumors, a POLE mutated cell line and standard DNA by WES and five different panels. If available, normal tissue was also exome sequenced. Sequencing data was analyzed by commercial software solutions and an in-house pipeline. A robust Pearson correlation (R = 0.9801 ± 0.0167; mean ± sd; N = 7) was determined for the different panels in a tumor paired normal setting for WES. Expanded analysis on tumor only exome sequenced samples yielded similar correlation (R = 0.9439 ± 0.0632; mean ± sd; N = 14). Remaining germline variants increased TMB in WES by 5.761 ± 1.953 (mean ± sd.; N = 7) variants per megabase (v/mb) for samples including synonymous variants and 3.883 ± 1.38 v/mb for samples without synonymous variants compared to tumor-normal paired calling results. Due to limited sample numbers in this study, additional replication is suggested for a clinical setting. Remaining germline variants in a tumor-only setting and artifacts caused by different library chemistries construction might affect the results.

Immune checkpoint inhibitors significantly increased survival across several tumor entities including non-smallcell lung cancer (NSCLC) and melanoma 1,2 . However, the response rate is highly variable even within a certain tumor entity, ranging from complete to no response. Thus, there is an urgent need for new predictive biomarkers to identify patients who are most likely to respond [3][4][5] .
Currently, two biomarkers are used to select patients: PDL1 expression as measured by immunohistochemistry is approved for companion and complementary testing prior to immunotherapy by the European Medicines Agency (EMA) and the Food and Drug Administration (FDA). In 2017, FDA also granted approval for the immunotherapy of solid tumors showing high microsatellite instability. Despite several attempts for standardization it seems that PDL1 immunohistochemistry assays alone remain insufficient as also some patients that are negative for PDL1 expression revealed response to immunotherapy [6][7][8][9] .
Retrospective studies showed the predictive ability of tumor mutational burden (TMB) to discriminate responders from non-responders across several tumor entities [10][11][12] . It is hypothesized that tumors with a higher mutation burden are likely to express and present more neoantigens and thereby induce a stronger immune response 13 . Supporting evidence came from the identification of other genome-related markers for response like mutations in genes related to DNA repair 14 and deficiencies in the mismatch repair system 15 .
In previous clinical studies, whole exome sequencing (WES) was used for the measurement of TMB 16,17 . WES tumor versus normal DNA sequencing is still taken as basis for the implementation of alternative methods. Due to its higher costs, the limited tissue availability and the need of sequencing matched normal DNA, WES is of Oncomine tumor mutation load assay. 20 ng DNA quantified with the Qubit dsDNA HS Assay (Thermo Fisher Scientific, Waltham, MA, USA) on the Qubit 2.0 Fluorometer (Thermo Fisher Scientific) was used for library preparation with the Oncomine Tumor Mutation Load Assay (Thermo Fisher Scientific) according to manufacturer's instructions. Library concentrations were quantified with the Ion Library TaqMan Quantification Kit (Thermo Fisher Scientific). Libraries were loaded on the Ion Chef for template preparation and chip loading using the Ion 540 Kit (Thermo Fisher Scientific), followed by sequencing on the Ion S5 XL System (Thermo Fisher Scientic).
Quality of the Ion S5 XL run was assessed with the Ion Torrent Suite 5.10 (Thermo Fisher Scientific). Data were analyzed with the Ion Reporter 5.10 (Thermo Fisher Scientific).
NEOplus v2 RUO panel. 200 ng DNA were quantified with the Qubit dsDNA HS Assay (Thermo Fisher Scientific) on the Qubit 2.0 Fluorometer (Thermo Fisher Scientific) and sheared on the Covaris E220 Focusedultrasonicator (Woburn, MA, USA) using the 8  For the custom panel, custom capture probes were designed using SureDesign (Agilent) for the target regions of 362 genes (Supplementary information 1). For library preparation SureSelect XT HS Reagent Kit (Agilent) was used following manufacturer's instructions. In brief, pre-enriched adapter-ligated libraries were prepared. Subsequently, custom capture probes or Human all Exon v6 capture probes were hybridized to target sequences to allow for sequence enrichment using streptavidin beads. Post-enriched libraries were quantified, pooled and sequenced on a NextSeq 500 (Illumina Inc., San Diego, CA, USA).
Quality of the NextSeq 500 (Illumina) sequencing runs were assessed with the Illumina Sequencing Analysis Viewer (Illumina). FASTQ files were generated using bcl2fastq Conversion Software (Illumina). Data were further analyzed by an in-house developed pipeline based on Mutect2.
Software pipeline for variant calling and filtering. Sequencing data was stripped from adapters by skewer 29 followed by primary alignment via bwa 30 and sorting with sambamba 31 . Read grouping and calling of molecular consensus reads was accomplished with fgbio (https ://githu b.com/fulcr umgen omics /fgbio ). Read groups were converted back to fastq with bedtools bamtofastq 32 and realigned with bwa.
Variants were called with GATK 4.0.11 Mutect2 33,34 . Raw calls were annotated with snpEFF 35 and filtered for exonic variants with SnpSift 36 . Only variants annotated as indels (insertion/deletion), SNVs (single nucleotide variants), frameshifts, affecting start/stop codons and splice site altering were allowed to pass the filter. We restricted the cutoff distance to ± 2 base pairs (bp) at the exon/intron boundary for splice sites. We also constructed a dataset that in addition contained coding synonymous variants.
Resulting vcf files were filtered by general population frequency in the non-TCGA version of the ExAC r0.3.1 database 37 , allowing only variants with minor allele frequencies < 0.01% to pass. In addition, we removed variants matching the 20,170,710 version of dbSNP150 unless they were found in the COSMIC v83 database. The threshold for variants in the tumor samples were 5 reads total as a minimum and an allelic fraction of 5% or more. At least 90% of the reads had to have a mapping quality > 1. As a measure to filter out sequencing artifacts, we used an in-house python script to screen for traces of a variant in a panel of 21 normals that had been subjected to exome sequencing with the SureSelect XT HS Exome kit (Agilent). If a variant could be detected in any normal sample, its allelic fraction (Af normal ) was compared to the one found in the tumor (Af tumor ). Only variants surpassing a ratio Af tumor Af normal > 4 in all 21 tumor-normal combinations were allowed to pass. If a matching normal was available, its alignment file was also added to the panel of normals to allow for a separate paired calling analysis.
The GATK 3.8 DepthOfCoverage-tool was used to determine the number of exonic basepairs with a coverage > 15 in each sample which was then used for TMB calculation.
Total coverage and average coverage for all targeted regions includes non-coding and intronic DNA on the deduplicated alignments. Total coverage was determined with bedtools coverage, average coverage was determined with GATK 3.8 DepthOfCoverage.
QIAseq TMB panel. 40  In addition to the Qiagen software, we also analyzed the data with our in-house pipeline (see description above) with minor modifications regarding the extraction of the umi (unique molecular index). Due to the different chemistry for library preparation, we also sequenced 15 normal samples independent from tumors that served as a panel of normal.
Variant annotation for filtering was done with Mutect2 FilterMutectCalls. Read_position and strand_artifact filter flags were removed for subsequent analysis. Further we employed the LearnReadOrientationModel of GATK to filter strand biases. Ethics approval and consent to participate. FFPE tissue samples were obtained as part of routine clinical care under approved ethical protocols complied with the Ethics Committee of the Medical Faculty of the University of Cologne, Germany and the study was approved by the same Ethics Committee (Ethics-No. 13-091, BioMaSOTA) and written informed consent was obtained from all patients before enrollment into the study. This resulted in the removal of somatic variants and lead to a false, low TMB value in paired analysis when compared to tumor only results. For further analysis we excluded the contaminated normal sample. This left us with 14 samples available for comparison between the different TMB panels and WES (tumor only). Out of those, 7 had also WES matching normal samples.

Coding synonymous variants in TMB evaluation.
Regarding the evaluation of TMB we tested the hypothesis whether coding synonymous variants, though not leading to the exposition of neoantigens, can still serve as a proxy for variants with an expected impact on protein structure. First, samples were tested for correlation of TMB values derived from non-synonymous variants and TMB values of only coding synonymous variants, when software pipelines allowed for the differentiation between coding synonymous and nonsynonymous variants. This examination showed a strong correlation (R = 0.9779 ± 0.0179; mean ± sd; N = 14). Results for the different panels are shown in Fig. 1. Due to the overlap between both datasets, the association was even stronger, when all somatic variants were compared to only non-synonymous ones (R = 0,9971 ± 0,004; mean + sd; data not shown). We noticed that the ratio of called synonymous to non-synonymous variants varied between the different panels (TruSight Oncology 500 (Illumina) = 0.2815 ± 0.1611; NEOplus RUO panel (NEO New Oncology) = 0.4715 ± 0.3488; SureSelect XT HS custom TMB panel (Agilent) = 0.3856 ± 0.2122; WES (tumor only) = 0.3833 ± 0.068; QIAseq TMB panel (Qiagen) Genomics workbench = 0.6222 ± 0.0825; QIAseq TMB panel (Qiagen) Mutect2 = 0.351 ± 0.0826; Oncomine Tumor Mutation Load assay (Thermo Fisher Scientific) = N/A, only non-synonymous variants reported). This suggests that the inclusion of silent somatic variants could have a technology-dependent impact on TMB values. We therefore based further analyses on both, TMB values excluding and including synonymous variants, respectively, to assess correlation. Reference point for correlation was always the TMB of the matching exome (non synonymous variants).
Comparison of gene panels against tumor paired normal WES. We next investigated how the presence of paired normal samples as the definition of the standard for WES data affects the TMB estimates. A strong correlation (R = 0.9801 ± 0.0167; mean ± sd; N = 7) between the tumor only analysis of all the different panels and paired tumor-normal analysis of the exomes could be observed ( Table 2). Correlations of all panels were significant after Bonferroni correction for multiple testing (< alpha 0.05). The different panels showed comparable distributions of TMB values relative to their averages, with the exception of the analysis of QIAseq TMB panel (Qiagen) data when analysed with our Mutect2 based pipeline displaying a smaller dynamic range (Fig. 2b).
Based on a commonly used cutoff value of > 10 variants / MB we calculated Overall Percent Agreement (OPA), Positive Percent Agreement (PPA) and Negative Percent Agreement (NPA) of the different panels after correction by the respective conversion factors in comparison to tumor paired normal exome sequencing (Supplementary information 3). Average OPA was 88.87 ± 9.27, PPA 88.97 ± 11.79 and NPA 88.46 ± 21.07 (mean ± sd; N = 7).  (Table 2), when coding synonymous variants were included.
As the exomes were analyzed both in paired mode as well as for tumor only, the number of variants that passed filtering in both modes were compared to get an estimation of potential germline variants that had passed in tumor only analysis. We calculated a difference of 147.625 ± 53.5092 (mean ± sd.; N = 8) for nonsynonymous variants and 219 ± 75.8897 variants including synonymous ones. TMB for WES tumor was increased    We also extended the analysis to all 14 samples and evaluated the correlation between the different panels and exome tumor only TMB values. Correlation was slightly reduced to 0,9439 ± 0,0632 (mean ± sd; N = 14) ( Table 2). Both the Ion Reporter 5.10 (Thermo Fisher Scientific) analysis as well as GATK in our own software pipeline on QIAseq TMB panel (Qiagen) data reported increased TMB values for sample 4 (Supplementary information 3). In addition, the Thermo-Fisher software reported an increased amount of variants suspected to be FFPE artifacts (data not shown).

Outliers in tumor only
For hybridization-based and single primer extension chemistry, it is possible to identify FFPE artifacts over a strand imbalance of the variant allelic fraction. An increased number of warnings from GATKs strand_bias filter was observed, when analyzing the QIAseq TMB panel (Qiagen) (data not shown). This might indicate an increased number of artifacts, e.g. derived from FFPE treatment during sample preparation or as an alternative hypothesis from primer artifacts that can occur during multiplex PCR. We used the experimental LearnReadOrientationModel tool of GATK to distinguish artifacts from real variants. Filtering the QIAseq TMB panel (Qiagen) data by the orientation filter increased correlation to the WES tumor only data from 0.875 to 0.9437 for variants including coding synonymous and 0.887 to 0.9412 for only non-synonymous variants, mainly by reducing the variant calls in sample 4. However, when applying the same procedure upon the smaller paired tumor-normal dataset, which does not include sample 4, correlation dropped to 0.8241/0.7330 (incl. synonymous / non-synonymous). This could be explained by much more stringent filter criteria for strand biases in this approach that might also result in unwanted filtering in samples less affected by strand imbalance prone artifacts. We also sought to determine the ratio of C:G > T:A transitions compared to the total value of variants which is a good indicator of FFPE artifacts. For the QIAseq panel, we calculated a ratio of 0.39 ± 0.11, while WES data had a ratio of 0.49 ± 0.14. For sample 4 the ratio was 0.51 (QIAseq) and 0.48 (WES).
Ratio for the QIAseq panel decreased to 0.37 ± 0.097 and in sample 4 it dropped to 0.43, when we decreased the cutoff VAF for variant filtering to 2% (data not shown).
In contrast, the CLC Genomics Workbench v12 (Qiagen) output for sample 4 in a previous version of the workflow (v. 1.35) stuck out with a much lower TMB of 22.8, as the software was using the total panel size as the denominator for the calculation of the TMB value which resulted in no normalization for the low coverage in the data of sample 4. After applying workflow 1.47 which fixed this issue during the course of our study, TMB values of the Genomics Workbench were in general complying with hybrid capture based assays.
All TMB gene panels except for WES showed reduced normalized coverage for sample 4 (Supplementary information 4). While raw sequencing output for this sample was already below average for a number of panels, deduplication further reduced the sample coverage across different panels even when the amount of input data was balanced across the samples (Supplementary information 2), which indicates a low library complexity.

Discussion
A vast number of different factors can influence TMB values, starting with the size of the panel, tumor entity library chemistry, sequencing platform and the specific genomic regions covered by the panel 39 . Therefore, none of analyses should be considered as a ground truth. Previous results suggested that ~ 1.1 MB of exonic coding regions can be considered as a sufficient size to reliably asses TMB 40 . All tested panels fulfil this criterion. It is not clearly shown yet whether it is best to use only non-synonymous coding variants as a more direct measurement for displayed neoantigens in the tumor or if coding synonymous variants can serve as a proxy for these values 41 . Traditionally non-synonymous variants have been mostly used for TMB estimation 22 as they have a direct influence upon protein structure and thereby neoantigen presentation of the cell. We observed a strong correlation between coding non-synonymous and coding synonymous variants across the different panels, showing that including synonymous variants might increase confidence in the determined TMB value due to higher, but still similar specific values. However, we cannot draw a final conclusion due to the limited cohort size, varying tumor entities and lack of data for patient outcome of targeted immunotherapy. Except for the NEO panel, only minor changes of correlation could be observed between analysis that included or excluded synonymous variants due to the fact that synonymous variants suffer from background noise like germline variants and artifacts in the same way that non-synonymous variants do. www.nature.com/scientificreports/ Analysis of the different panels in 7 independent samples showed a robust correlation to the results from paired tumor-normal WES. It is not surprising that WES for tumor only showed the highest correlation with WES tumor-normal, as the missing normal tissue is the only difference in the analysis. However, the solutions of Illumina and NEO New Oncology (if synonymous variants were included) had similar correlation values in comparison to WES data. Comparing WES in tumor only and tumor-normal paired calling analysis allowed us to determine the overhead of variants called for tumor only. As the data was filtered against the same panel of normals, these differences can be considered as unfiltered germline variants, rather than artifacts. A correlation to the total number of somatic variants was not observed, suggesting that somatic variants and germline variants were in general properly separated. It is important to note that a relatively constant but slightly deviating number of germline variants present in a tumor-only analysis has a proportionally higher influence on the outcome of samples with lower TMB-values. When considering tumor only analyses results or alternatively cutoff values for classification into low, intermediate and high, TMB must be corrected by the average difference to make them more comparable to paired tumor-normal calling. Standard deviation for the observed difference in germline variant numbers however poses a factor of uncertainty in tumor only analysis and will play a role in individual therapeutic decisions when it shifts classification of the TMB value.
It is interesting to speculate about reasons for different levels of germline noise and how to reduce or estimate it. Ethnic background and therefore representation of variants in germline databases like dbSNP and ExAC have been shown to play a role 42 . This effect might get reduced as more, specifically non-European individuals are sequenced and their variants get incorporated into germline databases. Meanwhile ethnicity of the patient should be taken into account and germline background levels for different ethnicities need to be established for precise diagnostic TMB evaluation. Long-term developments might therefore focus on determining haplotypes rather than ethnicity of the individual to estimate the probability of germline or somatic events with more precision. While this would still lack behind sequencing of a paired normal, it could increase signal to noise ratio of the analysis.
As the comparison of TMB panel sequencing data to tumor-normal paired WES calling gave the impression of generally similar results for all TMB panels, our analysis was extended with 7 additional samples where a paired normal tissue for the WES was missing. Therefore, TMB panel data was directly compared to tumor only WES data. In one of the samples, which was isolated from the POLE mutated and microsatellite-instable cell line CW-2, we obtained devious results for two panels and two pipelines. We chose to keep the sample in the analysis as an example of the robustness of the different pipelines. Tumor samples of low DNA quality occur on a daily basis in pathology labs and often there is no replacement available at all. The updated version of the CLC Genomics Workbench v12 (Qiagen), though issuing a warning message regarding the low coverage still emitted TMB values similar to the hybrid capture based solutions. For time-critical clinical practice it is noteworthy, that samples with a high TMB likely will be estimated correctly, even if the sample gets heavily undersequenced.
Fixation artifacts are a complicated issue to deal with. C:G > T:A transitions have been known for a long time to appear as a predominant artifact in formalin fixed tissues 43,44 . Analysis of the Oncomine Tumor Mutation Load assay (Thermo Fisher Scientific) with the Ion Reporter 5.10 (Thermo Fisher Scientific) seeks to detect the deamination proportion by evaluating the amount of C:G > T:A variants.
While this seems to work in the majority of samples, it failed in one challenging case. Other tested solutions distinguish both strands of the DNA during amplification, and filter C > T substitutions which only occur on either + or -strand of the DNA. Both hybridization-based as well as single primer extension assays thereby rely upon a certain amount of the complementary strand to be present for this evaluation. In addition, the TruSight Oncology 500 (Illumina) hybridization-based chemistry allows for duplex calling due to its adapters, which incorporate a double stranded unique molecular index (umi). This allows for identification of the matching partner from the same fragment, if captured and sequenced over its reverse complement umi.
The QIASeq TMB assay (Qiagen), compared to the hybridization-based solutions, seems to be prone for a different kind of artifact that still needs to be determined. FFPE artifacts have been described as predominantly present in variants with lower allelic fractions 45 . On the one hand, we do observe increased strand bias. There is also decreased correlation to exome sequencing at VAF cutoff 2% while correlation in our Agilent custom hybrid capture panel increases under these circumstances. This might be a hint for more false positive variants with smaller VAF in the QIAseq data. However we do not observe the expected C:G > T:A transitions. Neither are they more prominent in QIAseq data when compared to WES nor do they increase in the lower VAF spectrum. An alternative explanation might be artifacts based on the priming site of gene specific primers during the enrichment PCR of the library preparation.
Regarding the observed strand biases, our own software pipeline yet seems to lack precision with Qiagens primer extension chemistry.
Due to the limited sample size, interchangeability between different assays should not be suggested. A switch in routine diagnosis from WES to a panel should be well prepared by analyzing a larger batch of samples of the specific tumor entity with both methods. A recent in silico study also suggested that certain tumor entities might influence panel based TMB assessment more than others 39 . Different cutoff values will need to be applied to calculate agreement between WES and the preferred test as optimal treatment outcome is associated with tumor subtype specific cutoffs 46 . One of the main questions will be which test to choose. Agreement with established methods like WES should be considered, but for the most part did not appear to be an issue in this study. Establishment of a custom panel requires additional work, but offers the benefit of screening genomic regions of interest, which might spare the user running an additional assay. On the other hand required amount and quality of DNA as well as turnaround time and ability to cope with low quality input material also play an important role in routine diagnosis. www.nature.com/scientificreports/ conclusion In addition to PDL1 testing, estimation of TMB has been shown to be an important biomarker for the outcome of targeted immunotherapy. As we showed in our study, available assays and software solutions are in general comparable. Switching from one assay to another therefore might only require either adjustments for cutoff values of high, intermediate and low TMB values or alternatively a direct translation in the form of a linear equation. Including coding synonymous variants in the TMB analysis did not improve correlation for the different assays/ pipelines in general. While variations in germline background appeared to be manageable in our study in tumor only WES, we cannot draw a conclusion for the other assays.
We observed a complex behavior of tested solutions with regard to artifacts related to DNA fixation or sequencing, that manifest in certain basepair exchanges and strand biases. Not only seem some wet lab assays to be more prone for artifacts, their output data also provides different opportunities for error correction during downstream analysis. Our analysis showed that these artifacts need to be evaluated and addressed properly during data processing. The design and analysis of our own panel showed, that it is in fact possible to design custom solutions for assay and data processing.

Data availability
The datasets generated and/or analysed during the current study are available from the corresponding author on reasonable request.