XPC deficiency increases risk of hematologic malignancies through mutator phenotype and characteristic mutational signature

Recent studies demonstrated a dramatically increased risk of leukemia in patients with a rare genetic disorder, Xeroderma Pigmentosum group C (XP-C), characterized by constitutive deficiency of global genome nucleotide excision repair (GG-NER). The genetic mechanisms of non-skin cancers in XP-C patients remain unexplored. In this study, we analyze a unique collection of internal XP-C tumor genomes including 6 leukemias and 2 sarcomas. We observe a specific mutational pattern and an average of 25-fold increase of mutation rates in XP-C versus sporadic leukemia which we presume leads to its elevated incidence and early appearance. We describe a strong mutational asymmetry with respect to transcription and the direction of replication in XP-C tumors suggesting association of mutagenesis with bulky purine DNA lesions of probably endogenous origin. These findings suggest existence of a balance between formation and repair of bulky DNA lesions by GG-NER in human body cells which is disrupted in XP-C patients.

1. The authors do provide novel insight into mutagenic patterns associated with XPC-deficiency in internal cancers. They reveal transcriptional bias in multiple tumor types at a genomic level using sophisticated analyses. However, they primarily provide evidence of a potential mechanism rather than testing the proposed mechanism itself. Additionally, there are positive controls are lacking (for example, UV-damaged XPC-deficient skin cancer samples) from the analyses, which would strengthen their experimental design.
2. The authors do not discuss why pyrimidines on untranscribed strands would be repaired (but purines on untranscribed strands would) even if GG-NER was not functional due to XPC-deficiency. Discussion of the differences between pyrimidines and purines with respect to transcriptional bias is warranted. Related to this, in lines 137 to 139 the authors mention that the pattern they see could be the result of excess pyrimidine mutations or reduced purine mutations, but they do not discuss this in detail, and they do not provide evidence to support their comments. They should provide additional data to support a potential explanation.  Fig. 3, if purines are the focus (there are excess purine mutations on the untranscribed strand due to XPC deficiency) then purines should be listed first (e.g. red is transcribed for purines, untranscribed for pyrimidines) rather than pyrimidines listed first; Figure  3F should specify if pyrimidines or purines are the nucleotides on the untranscribed or transcribed strand. Fig. 4A states that the pattern seen is due to the absence of GG-NER. However, this is an assumption and would be more appropriate in the discussion section than a figure description, and/or supported by additional data.

5.
There are several small typos that should be corrected: line 50 -"untranscribed strands"; line 153 -"untranscribed strands"; line 155 -"strands"; 158 -"damage is on purine bases", 176 -"within a distance", 179 -"of the existence" Reviewer #3 (Remarks to the Author): Expert in leukemia genomics This is an interesting study that suggests that the genetic defect in XPC promotes a singular mutational signature with strand bias, which has potential mechanistic implications. I would suggest the following: It would be more compelling that the signature is pathogenic if the underlying mutated gene(s) are perturbed in an experimental model and this is sequenced to show that this perturbation induces the same signature.
While the signature may be characteristic, it is not a "specific" signature as the title is written to suggest -it is an established COSMIC signature seen in other contexts. While it may be that bulkly nucleotides may be responsible, it would be helpful if the authors could explore/explain WHY this specific pattern of mutated residues is observed.
The correlation with epigenetic state is potentially interesting but underdeveloped as correlation has only been performed with individual marks. It would be helpful if the authors correlate with chromatin state analysis that combines marks (e.g. ChromHMM).
The genomic description is focused on mutation burden and signature but is otherwise limitedwhat driver genes or others are mutated by sequence or structural variation? This could and should be examined and presented in more detail (and compared to sporadic tumors).
The actual name of the causal gene (Table 1, and in the text) should be stated.
Reviewer #5 (Remarks to the Author): Expert in DNA damage and genomics The manuscript by Yurchenko and colleagues is a study of somatic mutations in 6 leukemias and 2 sarcomas in xeroderma pigmentosum group C (XP-C) patients. Since the absent protein, XPC, is important for global excision repair, we would expect that non-UV related tumors in the patients will have distinctive mutational patterns. Indeed, this paper reports elevated mutation rates and a distinctive pattern of somatic single base mutations (single base substitutions, SBSs) in these tumors. The paper also reports *extremely* strong transcriptional strand asymmetry, which is consistent with the operation of transcription coupled nucleotide excision repair in the absence of XPC-mediated global excision repair. The paper also substantiates a pattern of mutations consistent with elevated error-prone translesion synthesis on the lagging versus leading replication strand opposite purines with bulky adducts.
I believe the conclusions of the paper are substantially correct, and the analysis is both careful and thorough. Some notable strengths include the analysis of interaction of the mutation density with transcriptional and replication strand bias. The extremely strong transcriptional strand asymmetry is important confirmatory evidence that this signature is free of sequencing artifacts and distinct from SBS8. This paper constitutes an important advance in the study of the function of XPC as reflected by the consequences of its absence.
I have no concerns about the statistical analyses or bioinformatic analyses except for a request to provide the code for the analysis of clustered mutations, as noted below.

ESSENTIAL CHANGES NEEDED
The authors absolutely must provide the final lists of filtered variant calls either as supplementary information, or, if these are considered protected information, on an appropriate archive.
The authors absolutely must provide the single base substitution (SBS), doublet base substitution (DBS) and indel spectra in numerical form (i.e. in Excel or .CSV files or as VCFs).
Presence of the data in EGA is essential. I would encourage the authors to submit immediately (the data can be embargoed). Getting data uploaded and released on EGA can be slow.

OTHER MAJOR COMMENTS
It is essentially impossible to give a precise verbal description of the analysis of clustered mutations (lines 608 -637). I strongly suggest providing the code.
The double base substitution and indel substitution patterns are also quite distinctive. This is worth a mention in the results.

PRESENTATION COMMENTS
The paper is quite dense, with many long and slightly damaged sentences. The authors would do well to follow the advice I give trainees in my lab: If English is not your mother tongue, do not write long sentences. Actually, even if it is your mother tongue, do not write long sentences. Why make it harder for others to understand your work? Some specific suggestions on presentation: The point made on line 168 through 172 is quite interesting but rather buried. Suggest starting a new paragraph at "TLS polymerases which are recruited…" At the beginning of results suggest that you clarify that you sequenced cancer and non-cancer to identify cancer-specific somatic mutations.
Lines 28-29 "conferring to its elevated incidence and early appearance". I think the intent is ", which we presume leads to the elevated incidence and early appearance of leukemias in these patients" Line 42 not sure what "and XP variant" refers to.
Line 44 "on" -> "into" Line 65, delete "the" in "the age" Lines 237 -239 seem out of sync with the paper, as the Comoros population is not genetically North African and the patient had the Comorian IVS12 variant.

Reviewer #1 (Remarks to the Author): Expert in DNA damage
Title: XPC deficiency increases risk of hematologic malignancies through mutator phenotype and specific mutational signature While this is an interesting and timely study, the impact could be strengthened by providing some mechanistic insight to support and solidify their conclusions. Comments/Concerns: 1. The authors do provide novel insight into mutagenic patterns associated with XPCdeficiency in internal cancers. They reveal transcriptional bias in multiple tumor types at a genomic level using sophisticated analyses. However, they primarily provide evidence of a potential mechanism rather than testing the proposed mechanism itself. 2. The authors do not discuss why pyrimidines on untranscribed strands would be repaired (but purines on untranscribed strands would) even if GG-NER was not functional due to XPC-deficiency. Discussion of the differences between pyrimidines and purines with respect to transcriptional bias is warranted. Related to this, in lines 137 to 139 the authors mention that the pattern they see could be the result of excess pyrimidine mutations or reduced purine mutations, but they do not discuss this in detail, and they do not provide evidence to support their comments. They should provide additional data to support a potential explanation. In our case we observe no difference between mutation rates in pyrimidines on transcribed strand (or purines on untranscribed) in genic and intergenic regions, but we observe a strong decrease of mutations in purines on the transcribed strand (or pyrimidines on untranscribed) in genes as compared to intergenic regions (Figure 3f). This analysis is compatible with the decrease of mutations from purines on the transcribed strand resulting from the activity of TC-NER. Decrease of mutation rates on the untranscribed strand as compared to transcribed strand or intergenic regions is highly unlikely, and to our knowledge was not described in the literature, therefore we do not pursue this possibility. See also the response to the previous comment where symmetrical situation for UV-induced mutations from pyrimidines in XP-C skin cancer is discussed. We added to the Results section an explanation of patterns in XP-    Fig. 4A states that the pattern seen is due to the absence of GG-NER. However, this is an assumption and would be more appropriate in the discussion section than a figure description, and/or supported by additional data.

The description in
Response: We agree with the reviewer, and modified the legend using more accurate wording:" .. which is compatible with the absence of GG-NER" (L391) In this analysis we hypothesized that due to the absence of GG-NER we should observe strong difference in terms of mutation rates between transcribed and untranscribed strands, particularly in early replicating regions known to be actively 5. There are several small typos that should be corrected: line 50 -"untranscribed strands"; line 153 -"untranscribed strands"; line 155 -"strands"; 158 -"damage is on purine bases", 176 -"within a distance", 179 -"of the existence" Response: Corrected.

Reviewer #3 (Remarks to the Author): Expert in leukemia genomics
This is an interesting study that suggests that the genetic defect in XPC promotes a singular mutational signature with strand bias, which has potential mechanistic implications. I would suggest the following: 2. While the signature may be characteristic, it is not a "specific" signature as the title is written to suggest -it is an established COSMIC signature seen in other contexts.
While it may be that bulkly nucleotides may be responsible, it would be helpful if the authors could explore/explain WHY this specific pattern of mutated residues is observed.
Response: The signature "C" had moderate Cosine similarity (0.86) with the Signature 8 from the pancancer analysis (COSMIC) being different by specific trinucleotide contexts (VpCpT > D and NpCpT > T where V designates A,C,T and D -A,G,T; Figure 2a) and more importantly had much more pronounced transcriptional bias thereafter we preferred to describe exactly the similarities and differences between the signatures instead of designating this signature ("C") to Signature 8. Reviewer 5 also noticed the important differences between the Signature "C" and Signature "8". We corrected the title changing the word "specific" to "characteristic".
3. The correlation with epigenetic state is potentially interesting but underdeveloped as correlation has only been performed with individual marks. It would be helpful if the authors correlate with chromatin state analysis that combines marks (e.g.

ChromHMM).
Response: Following this valuable suggestion, we performed an additional analysis  Response: We agree with the reviewer that this is an important piece of information but this question has already been discussed in the recent paper by us which described

Reviewer #5 (Remarks to the Author): Expert in DNA damage and genomics
The manuscript by Yurchenko and colleagues is a study of somatic mutations in 6 leukemias and 2 sarcomas in xeroderma pigmentosum group C (XP-C) patients. Since the absent protein, XPC, is important for global excision repair, we would expect that non-UV related tumors in the patients will have distinctive mutational patterns.
Indeed, this paper reports elevated mutation rates and a distinctive pattern of somatic single base mutations (single base substitutions, SBSs) in these tumors. The paper also reports *extremely* strong transcriptional strand asymmetry, which is consistent with the operation of transcription coupled nucleotide excision repair in the absence of XPC-mediated global excision repair. The paper also substantiates a pattern of mutations consistent with elevated error-prone translesion synthesis on the lagging versus leading replication strand opposite purines with bulky adducts.
I believe the conclusions of the paper are substantially correct, and the analysis is both careful and thorough. Some notable strengths include the analysis of interaction of the mutation density with transcriptional and replication strand bias. The extremely strong transcriptional strand asymmetry is important confirmatory evidence that this signature is free of sequencing artifacts and distinct from SBS8. This paper constitutes an important advance in the study of the function of XPC as reflected by the consequences of its absence.
I have no concerns about the statistical analyses or bioinformatic analyses except for a request to provide the code for the analysis of clustered mutations, as noted below.