Fanconi anaemia (FA), a model syndrome of genome instability, is caused by a deficiency in DNA interstrand crosslink repair resulting in chromosome breakage1,2,3. The FA repair pathway protects against endogenous and exogenous carcinogenic aldehydes4,5,6,7. Individuals with FA are hundreds to thousands fold more likely to develop head and neck (HNSCC), oesophageal and anogenital squamous cell carcinomas8 (SCCs). Molecular studies of SCCs from individuals with FA (FA SCCs) are limited, and it is unclear how FA SCCs relate to sporadic HNSCCs primarily driven by tobacco and alcohol exposure or infection with human papillomavirus9 (HPV). Here, by sequencing genomes and exomes of FA SCCs, we demonstrate that the primary genomic signature of FA repair deficiency is the presence of high numbers of structural variants. Structural variants are enriched for small deletions, unbalanced translocations and fold-back inversions, and are often connected, thereby forming complex rearrangements. They arise in the context of TP53 loss, but not in the context of HPV infection, and lead to somatic copy-number alterations of HNSCC driver genes. We further show that FA pathway deficiency may lead to epithelial-to-mesenchymal transition and enhanced keratinocyte-intrinsic inflammatory signalling, which would contribute to the aggressive nature of FA SCCs. We propose that the genomic instability in sporadic HPV-negative HNSCC may arise as a result of the FA repair pathway being overwhelmed by DNA interstrand crosslink damage caused by alcohol and tobacco-derived aldehydes, making FA SCC a powerful model to study tumorigenesis resulting from DNA-crosslinking damage.
This is a preview of subscription content, access via your institution
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Human data, including 34 Illumina whole-genome samples (22 tumour and 12 matched normal), 70 Illumina whole-exome samples (37 tumour and 33 matched normal), 13 PacBio whole-genome samples (9 tumour and 4 matched normal), 4 10x linked-read whole-genome tumour samples, 6 bulk RNA-seq tumour samples, 3 10x single-cell or single-nuclei RNA-seq tumour samples, 1 10x Visium spatial RNA-seq tumour samples, and 6 EPIC 850K methylation array tumour samples generated in this study, are available in dbGAP under accession phs002652.v1.p1 via controlled access. Samples from four individuals (F10P1, F15P1, F28P1 and F34P1) did not have consent for release of raw sequencing data to a repository. These four tumour–normal whole-exome sample pairs are available from the Smogorzewska laboratory after fulfilling requirements to participate in IRB protocol #AAU-0112 at the Rockefeller University. Mouse sequencing data, including 32 bulk RNA-seq samples and 10 Illumina whole-genome samples, are available on the Sequence Read Archive under accession PRJNA753831 and the Gene Expression Omnibus under accession GSE195811. Source data are provided with this paper.
Code used in these studies is available from https://github.com/MathijsSanders/SangerLCMFiltering, https://github.com/MathijsSanders/AnnotateBRASS. https://github.com/MathijsSanders/ExtractSVChains, https://github.com/MathijsSanders/validateStructuralVariants, https://github.com/MathijsSanders/annotateSniffles. https://github.com/MathijsSanders/validateSVIllumina and https://github.com/MathijsSanders/overlapSnifflesPindel.
Auerbach, A. D. & Wolman, S. R. Susceptibility of Fanconi’s anaemia fibroblasts to chromosome damage by carcinogens. Nature 261, 494–496 (1976).
Sasaki, M. S. & Tonomura, A. A high susceptibility of Fanconi’s anemia to chromosome breakage by DNA cross-linking agents. Cancer Res. 33, 1829–1836 (1973).
Taylor, A. M. R. et al. Chromosome instability syndromes. Nat. Rev. Dis. Primers 5, 64 (2019).
Garaycoechea, J. I. et al. Alcohol and endogenous aldehydes damage chromosomes and mutate stem cells. Nature 553, 171–177 (2018).
Langevin, F., Crossan, G. P., Rosado, I. V., Arends, M. J. & Patel, K. J. Fancd2 counteracts the toxic effects of naturally produced aldehydes in mice. Nature 475, 53–58 (2011).
Pontel, L. B. et al. Endogenous formaldehyde is a hematopoietic stem cell genotoxin and metabolic carcinogen. Mol. Cell 60, 177–188 (2015).
Rycenga, H. B. & Long, D. T. The evolving role of DNA inter-strand crosslinks in chemotherapy. Curr. Opin. Pharmacol. 41, 20–26 (2018).
Alter, B. P., Giri, N., Savage, S. A. & Rosenberg, P. S. Cancer in the National Cancer Institute inherited bone marrow failure syndrome cohort after fifteen years of follow-up. Haematologica 103, 30–39 (2018).
Johnson, D. E. et al. Head and neck squamous cell carcinoma. Nat. Rev. Dis. Primers 6, 92 (2020).
Ceccaldi, R., Sarangi, P. & D’Andrea, A. D. The Fanconi anaemia pathway: new players and new functions. Nat. Rev. Mol. Cell Biol. 17, 337–349 (2016).
Kottemann, M. C. & Smogorzewska, A. Fanconi anaemia and the repair of Watson and Crick DNA crosslinks. Nature 493, 356–363 (2013).
Wang, A. T. & Smogorzewska, A. SnapShot: Fanconi anemia and associated proteins. Cell 160, 354–354.e351 (2015).
The Cancer Genome Atlas Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 517, 576–582 (2015).
Alter, B. P. et al. Squamous cell carcinomas in patients with Fanconi anemia and dyskeratosis congenita: a search for human papillomavirus. Int. J. Cancer 133, 1513–1515 (2013).
Hoskins, E. E. et al. The Fanconi anemia pathway limits human papillomavirus replication. J. Virol. 86, 8131–8138 (2012).
Kutler, D. I. et al. Human papillomavirus DNA and p53 polymorphisms in squamous cell carcinomas from Fanconi anemia patients. J. Natl Cancer Inst. 95, 1718–1721 (2003).
Sauter, S. L. et al. Oral human papillomavirus is common in individuals with Fanconi anemia. Cancer Epidemiol. Biomarkers Prev. 24, 864–872 (2015).
van Zeeburg, H. J. et al. Clinical and molecular characteristics of squamous cell carcinomas from Fanconi anemia patients. J. Natl Cancer Inst. 100, 1649–1653 (2008).
Menghi, F. et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc. Natl Acad. Sci. USA 113, E2373–E2382 (2016).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Willis, N. A. et al. Mechanism of tandem duplication formation in BRCA1-mutant cells. Nature 551, 590–595 (2017).
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).
Koren, A. et al. Differential relationship of DNA replication timing to different forms of human mutation and variation. Am. J. Hum. Genet. 91, 1033–1040 (2012).
Howlett, N. G., Taniguchi, T., Durkin, S. G., D’Andrea, A. D. & Glover, T. W. The Fanconi anemia pathway is required for the DNA replication stress response and for the regulation of common fragile site stability. Hum. Mol. Genet. 14, 693–701 (2005).
Zeman, M. K. & Cimprich, K. A. Causes and consequences of replication stress. Nat. Cell Biol. 16, 2–9 (2014).
Campbell, J. D. et al. Genomic, pathway network, and immunologic features distinguishing squamous carcinomas. Cell Rep. 23, 194–212.e196 (2018).
Marsit, C. J. et al. Inactivation of the Fanconi anemia/BRCA pathway in lung and oral cancers: implications for treatment and survival. Oncogene 23, 1000–1004 (2004).
Wreesmann, V. B., Estilo, C., Eisele, D. W., Singh, B. & Wang, S. J. Downregulation of Fanconi anemia genes in sporadic head and neck squamous cell carcinoma. ORL J. Otorhinolaryngol. Relat. Spec. 69, 218–225 (2007).
Harding, S. M. et al. Mitotic progression following DNA damage enables pattern recognition within micronuclei. Nature 548, 466–470 (2017).
Heddle, J. A., Lue, C. B., Saunders, E. F. & Benz, R. D. Sensitivity to five mutagens in Fanconi’s anemia as measured by the micronucleus method. Cancer Res. 38, 2983–2988 (1978).
Mackenzie, K. J. et al. cGAS surveillance of micronuclei links genome instability to innate immunity. Nature 548, 461–465 (2017).
Velleuer, E. et al. Diagnostic accuracy of brush biopsy-based cytology for the early detection of oral cancer and precursors in Fanconi anemia. Cancer Cytopathol. 128, 403–413 (2020).
Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell 171, 1611–1624.e1624 (2017).
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514.e422 (2020).
The Cancer Genome Atlas Research Network. Integrated genomic characterization of oesophageal carcinoma. Nature 541, 169–175 (2017).
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Molenaar, J. J. et al. Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes. Nature 483, 589–593 (2012).
Bakhoum, S. F. et al. Chromosomal instability drives metastasis through a cytosolic DNA response. Nature 553, 467–472 (2018).
Cerami, E. et al. The cBio Cancer Genomics Portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
Auerbach, A. D. & Schroeder, T. M. First announcement of the Fanconi Anemia International Registry. Blood 60, 1054 (1982).
Nowak, J. A. & Fuchs, E. Isolation and culture of epithelial stem cells. Methods Mol. Biol. 482, 215–232 (2009).
Schober, M. & Fuchs, E. Tumor-initiating stem cells of squamous cell carcinomas and their control by TGF-β and integrin/focal adhesion kinase (FAK) signaling. Proc. Natl Acad. Sci. USA 108, 10544–10549 (2011).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2021).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Tate, J. G. et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at bioRxiv https://doi.org/10.1101/372896 (2020).
Wang, S., Tao, Z., Wu, T. & Liu, X. S. Sigflow: an automated and comprehensive pipeline for cancer genome mutational signature analysis. Bioinformatics 37, 1590–1592 (2021).
Van Doorslaer, K. et al. The papillomavirus episteme: a major update to the Papillomavirus Sequence Database. Nucleic Acids Res. 45, D499–D506 (2017).
Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput. Biol. 12, e1004873 (2016).
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016).
Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).
Oesper, L., Satas, G. & Raphael, B. J. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics 30, 3532–3540 (2014).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Barlow, J. H. et al. Identification of early replicating fragile sites that contribute to genome instability. Cell 152, 620–632 (2013).
Bignell, G. R. et al. Signatures of mutation and selection in the cancer genome. Nature 463, 893–898 (2010).
Fungtammasan, A., Walsh, E., Chiaromonte, F., Eckert, K. A. & Makova, K. D. A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome. Genome Res. 22, 993–1005 (2012).
Crosetto, N. et al. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nat. Methods 10, 361–365 (2013).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Geoffroy, V. et al. AnnotSV: an integrated tool for structural variations annotation. Bioinformatics 34, 3572–3574 (2018).
Clarke, L. et al. The International Genome Sample Resource (IGSR): a worldwide collection of genome variation incorporating the 1000 Genomes Project data. Nucleic Acids Res. 45, D854–D859 (2017).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Soneson, C., Love, M. I. & Robinson, M. D. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 4, 1521 (2015).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e3529 (2021).
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2020).
Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat. Protoc. 15, 1484–1506 (2020).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e1821 (2019).
Waltman, L. & Van Eck, N. J. A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86, 471 (2013).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Bergenstrahle, J., Larsson, L. & Lundeberg, J. Seamless integration of image and molecular analysis for spatial transcriptomics workflows. BMC Genomics 21, 482 (2020).
Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018).
Raine, K. M. et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinformatics 56, 15.19.11–15.19.17 (2016).
Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinformatics 14, 178–192 (2013).
We thank the participants and their families who donated their tissues to IFAR; the physicians who provided research samples and clinical information; the staff of Fanconi Anemia Research Fund, especially S. Planck for referrals to IFAR; National Disease Research Interchange (NDRI) for providing samples; M. Grompe for providing Fanca mutant mice; members of the Laboratory of Genome Maintenance for advice; N. Papazian and T. Cupedo for assistance with sample processing; E. Bindels for assistance with single-cell analysis; staff of the Genomic, Reference Genome, Bioinformatics and Flow Cytometry resource centres at the Rockefeller University for their expert advice and contribution; E. Fuchs and members of her laboratory for advice on keratinocyte growth conditions. Genomic data from non-FA SCCs are in part based on data generated by the TCGA Research Network (https://www.cancer.gov/tcga). This study was supported by the Pershing Square Sohn Prize for Young Investigators in Cancer Research (A.S.), Fanconi Anemia Research Fund (R.D., E.V. and A.S.), V Foundation grant T2019-013 (A.S.), National Institutes of Health (NIH) National Heart Lung and Blood Institute (R01 HL120922) (A.S.), National Cancer Institute (R01 CA204127) (A.S.), National Center for Advancing Translational Sciences (UL1 TR001866) (R.V. and A.S.), NIH award 1DP2-GM123495 (A.K.). S.C.C. acknowledges support from the Intramural Research Program of the NIH National Human Genome Research Institute. M.A.S. is supported by a Rubicon fellowship from NWO (019.153LW.038) and a KWF Kankerbestrijding Young Investigator Grant (12797/2019-2, Bas Mulder Award; Dutch Cancer Foundation). A.S. is a Howard Hughes Faculty Scholar.
Rocket Pharmaceuticals provided research funding and partial salary support to A.S. for an unrelated project. P.J.C. is a founder, director and consultant for Mu Genomics. B.S. is a co-inventor of intellectual property related to DCN1 small molecule inhibitors licensed by MSK to Cinsanso. He has rights to receive royalty income as a result of this arrangement. MSK has financial interests related to this intellectual property and Cinsanso as a result of this arrangement. The other authors declare no competing interests.
Peer review information
Nature thanks Anthony Nichols, Tobias Rausch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a Age at diagnosis for the FA SCCs for which complete clinical data was available (n = 41), HPV+ (n = 71) and HPV- (n = 415) sporadic HNSCC cohorts. Clinical data for sporadic HNSCC cohorts were obtained from the TCGA database. b and c Characteristics of the FA individuals in this study. Some individuals had multiple cancers. For these cases, survival was calculated from the first cancer sequenced in this study. * numbers are based on 41 individuals with complete history. ** based on the first sample sequenced if multiple tumors sequenced. d Type and tissue site of the sequenced tumors. * two were in pyriform sinus, one in oropharynx; ** cell lines are from the tongue, pharynx, and oral cavity. *** one of these samples is a metastasis to a lymph node of another tumor in this set. e TP53 variant allele frequency (%) spread for n = 43 biologically independent FA SCCs with a TP53 SNV or indel mutation. Mutant allele frequency was corrected for individual tumor purity as calculated by Theta2. Median and IQR are indicated. f Oncoplot of the FA SCC cohort indicating the variant type by color and the gene effected is listed on the left. Recurrent focal sCNAs were defined by GISTIC2. Amplifications were classified as log2(sCNA) ≥ 0.9 and focal deletions as log2(sCNA) ≤ −0.9 after normalizing for tumor purity. Samples are stratified by SCC tissue subtype. One adenocarcinoma sample (cervical adenocarcinoma) is shown, while the bladder and intestinal adenocarcinomas are not displayed. The y-axis of the top graph indicates the number of total somatic gene alterations from the GISTIC2 and SNV/indel analysis. In all cases, n refers to independent biological samples or individuals.
a Comparison of tumor mutation burden between TCGA cohorts and the FA SCC cohort (n = 55 independent SCCs). Each dot represents the number of exonic SNV and indel mutations detected per sample, with median mutation burden indicated by a black horizonal line. FA SCC samples are colored red and TCGA-HNSCC samples are colored blue. b sigfit (Bayesian procedure) and Sigflow (bootstrapping procedure) extraction of COSMIC single-base substitution (SBS) signatures from n = 13 HSCT-negative FA SCC whole-exome samples (each with > 100 SNVs) and n = 4 HSCT-negative FA SCC whole-genome samples (with SNV calls restricted to the exome). c sigfit and Sigflow extraction of SBS signatures from n = 4 HSCT-negative FA SCC whole-genome samples, surveying genome-wide SNVs. d sigfit and Sigflow extraction of COSMIC indel (ID) signatures from n = 4 HSCT-negative FA SCC whole-genome samples with matched normal controls. Mutation fraction indicates the fraction of tumor mutations that can be explained by a particular signature. Signature exposure (sig exposure) is the number of mutations that contributed to a particular signature. sigfit, error bars indicate the 95% highest posterior density (HPD) intervals. Grey bars indicate non-significant signature exposures, defined as exposures for which the lowest HPD limit is less than 0.01. Sigflow, boxplots indicate the median signature exposure value and interquartile range (IQR). Whisker ends are positioned at Q1 (first quartile) − 1.5xIQR, or at the minimum value when larger than this lower range value, and Q3 (third quartile) + 1.5xIQE, or at the maximum value when smaller than this upper range value. In all cases, n refers to independent biological samples.
a Plot displaying chromosomal locations of recurrent focal amplification peaks detected by GISTIC2 in all FA SCCs (n = 60 samples, including 55 independent SCCs, 2 SCC metastases, and 3 SCC samples sequenced by both WGS/WES) and one cervical adenocarcinoma. GISTIC2 q-value is shown below, with default minimum calling threshold displayed as a green line. b A plot displaying chromosome location of recurrent focal deletion peaks detected by GISTIC2 in all FA SCCs (n = 60). GISTIC2 q-value is shown below, with default minimum calling threshold displayed as a green line. c Copy-number alteration heatmap displaying detected sCNAs for all FA SCCs (n = 60) and one FA-associated cervical adenocarcinoma, colored by amplitude intensity and normalized for individual tumor purity. Each row is a tumor sample. d Comparison of focal sCNA numbers between FA SCC, HPV+ sporadic HNSCC, HPV− sporadic HNSCC, BRCA2mut carcinomas, and BRCA1mut carcinomas. For FA SCC, n = 20 whole-genome & n = 40 whole-exome samples are displayed and colored by sample type. For HPV+ sporadic HNSCC, n = 18 whole-genome samples and n = 71 genome-wide CNV array (CGH) are shown separately. For HPV− sporadic HNSCC, n = 24 whole-genome samples and n = 415 CGH samples are displayed separately. For BRCA2mut carcinomas n = 41 whole-genome samples are shown. For BRCA1mut carcinomas, n = 24 whole-genome samples are displayed. Focal copy number alterations are defined by GISTIC2, and gated at log2(sCNA) ≥ 0.9 or log2(sCNA) ≤ −0.9 after correcting for tumor purity. Two-tailed Mann-Whitney U test p-values are indicated, with median and IQR shown. e ASCAT plot of a WGS FA SCC (F17P1). Total copy number is represented by the purple line. Minor allele is represented by the blue line. Indicated are notable oncogenes and tumor suppressors localizing to focal sCNA regions. f Genomic Circos plot displaying all somatic SV events detected by Illumina WGS of sample F17P1 depicted in panel e. g ASCAT plots displaying allele-specific CNAs in select WES-sequenced FA SCC tumors with little to no detectable HSCT-donor SNP contamination. Upper left (F32P1), Upper right – (F16P1-Vulv), Bottom left (F4P1), Bottom right (F25P1). F32P1 is HPV+, but harbors somatic deletions of TP53 and CDKN2A. In all cases, n refers to independent biological samples.
a Scatter plot displaying localization of 8,896 SV breakpoints (from 4,448 SVs) in FA SCC by chromosome and genomic position. Relative breakpoint density is indicated by height from the baseline. Annotated are curated oncogenes and tumor suppressors localizing to breakpoint hotspots. b Hatchet subclonal absolute copy-number prediction of a low-HSCT+ FA SCC (F44P1). A copy number of 2 is considered copy-neutral. Individual predicted subclones (n = 3) are displayed as distinct colored lines. c Battenberg-DPClust decomposition of 4 HSCT-negative FA tumor whole-genome samples with matched normal controls. Each annotated peak is a detected clone within the SCC, with peak area indicating fractional composition of tumor cells.
Extended Data Fig. 5 SV breakpoint localization of FA SCC, sporadic HNSCC, BRCA2mut, and BRCA1mut tumors in relation to genome replication timing and fragile sites.
a Replication timing of the SV breakpoint loci. Plotted in black is the expected replication timing distribution. Plotted in blue is the observed SV breakpoint localization to early, mid, or late replicating genomic regions. Vertical axis indicates relative abundance and horizontal axis indicates standard deviation from mean replication timing. Kolmogorov-Smirnov (KS) p-values are indicated. n corresponds to the number of breakpoints included in the sample for each analysis. b Binned SV breakpoint counts from the indicated cohorts and SV class, localizing to common and rare fragile sites. SV class highlighted in red indicates a significant association, as determined by indicated p-value of two-tailed z-score test compared to 1000 permutations of fragile site locations. c Binned SV breakpoint counts localizing to “early-replicating fragile sites”. SV class highlighted in red indicates a significant association, as determined by indicated p-value of two-tailed z-score test compared to 1000 permutations of fragile site locations.
Extended Data Fig. 6 Complex SVs in FA SCC and the transcriptional landscapes of FA SCC and sporadic HNSCC.
a Number of somatic SV chains detected in 10x linked-read WGS of FA SCCs (n = 4), where a chain is defined as ≥ 2 discrete SVs (≥ 4 unique breakpoints). Median and IQR are indicated. b Number of SVs present in 108 SV chains in 10x linked-read WGS of FA SCCs. Mean (4.6 SVs) and IQR are indicated. c Number of SVs of indicated class present in 108 SV chains from 10x linked read WGS of FA SCCs. Means and IQRs are indicated. d SV breakpoint distribution from 108 SV chains stratified by human chromosome number. e Somatic SV burden of n = 9 PacBio-sequenced FA SCCs. 3 samples (indicated) were sequenced to 10x average coverage, and 6 samples were sequenced to 30x average coverage. f Somatic SV class proportions in n = 9 PacBio-sequenced FA SCCs. Medians and IQRs are indicated. g Illumina & PacBio % SV call overlap for SVs > 1kb and deletions < 1kb for n = 9 FA SCCs sequenced on both platforms. Shown are % of PacBio SV calls > 1kb present in Illumina BRASS output, % of PacBio deletion calls < 1kb present in Illumina indel calls, and % of Illumina SV calls > 1kb present in PacBio BAMs. Median and IQR are indicated. h Comparison of deletion sizes (<1kb) detected by SV calling in n = 9 PacBio FA SCCs and by indel calling in the same 9 FA SCCs sequenced by Illumina WGS. Median and IQR are shown. i Examples of fold-back inversions (FBI) driving sharp copy-number change at key oncogenic loci identified in FA SCCs (PacBio data). j Comparison of the raw number of unbalanced translocation events in FA SCC (n = 20), HPV-negative sporadic HNSCC (n = 23), BRCA2mut (n = 41), and BRCA1mut (n = 24) cohorts. Two-tailed Mann-Whitney U test p-values are indicated, with median and IQR shown. k Comparison of hg19 expected vs. observed percentage of somatic SV breakpoints in 9 PacBio-sequenced FA SCCs that localize to repeat regions. Unpaired two-tailed t-test p-value is indicated (t = 7.371, df = 8), with median and IQR shown. l Breakpoint density graph displaying GC% sequence composition within +/− 100bp from SV breakpoints identified in PacBio sequencing data, calculated relative to hg19 global GC% frequency (40.9%) (notated as “expected”). Median and IQR are displayed. m Comparison of hg19 expected vs. observed percentage of somatic SV breakpoints from FA SCCs (n = 20) and HPV-negative sporadic HNSCC cohorts (n = 23) that localize to repeat regions and to the indicated repeat class (Illumina WGS). Two-tailed Mann-Whitney U test p-values are indicated, with median and IQR shown. n Comparison of the number of retrotransposon element (RTE) insertions in FA SCC (n = 20), HPV-negative sporadic HNSCC (n = 23), BRCA2mut (n = 41), and BRCA1mut (n = 24) cohorts. Two-tailed Mann-Whitney U test p-values are indicated, with median and IQR shown. o Cancer-relevant genes differentially expressed between FA SCC (n = 6) and sporadic HNSCC (n = 520) as assessed by RNAseq, including genes displayed in Fig. 1c. Differential expression is gated at log2(FC) > 1 or log2(FC) <−1 with DESeq2 FDR-adjusted p-value < 0.05. DESeq2 implementation of Wald test with FDR-adjusted p-value is indicated. Genes whose relative expression are impacted by a sCNA are colored orange. Genes whose relative expression is discordant with sCNA frequency are colored blue. Genes not identified in focal sCNA peaks are colored white. GAPDH and PGK1 are indicated in black and added as housekeeping controls. p Quality-control distribution graph showing log2(FC) values of all genome-wide transcripts comparing FA SCC (n = 6) vs sporadic HNSCC (n = 520). Median and IQR is displayed. q DNA repair genes differentially expressed in FA SCC (n = 6) versus sporadic HNSCC (n = 520) by RNAseq. Differential expression is gated at log2(FC) > 1 or log2(FC) <−1 with DESeq2 FDR-adjusted p-value < 0.05. DESeq2 implementation of Wald test with FDR-adjusted p-value is indicated. r Aldehyde dehydrogenase (Aldh) and alcohol dehydrogenase (Adh) genes differentially expressed between FA SCC (n = 6) and sporadic HNSCC (n = 520). Differential expression is gated at log2(FC) > 1 or log2(FC) <−1 with DESeq2 FDR-adjusted p-value < 0.05. DESeq2 implementation of Wald test with FDR-adjusted p-value is indicated. s Gene-set enrichment/depletion (GO) analysis of genes differentially expressed between FA SCC and sporadic HNSCC. Genes entered into analysis were gated at log2(FC) > 1 or log2(FC) <−1 with DESeq2 FDR-adjusted p value < 10−5. Gene sets were gated at > 2-fold enrichment over expected background with GO Fisher’s exact test FDR-adjusted p-value < 0.01 to be reported in the figure. In all cases, n refers to independent biological samples.
a Oncoplot of 415 HPV-negative sporadic HNSCCs, displaying somatic copy-number alteration (sCNA) or SNV/indel-alteration of FA pathway genes or ALDH2. Mutation type is indicated in the legend. Top bar graph indicates the relative copy-number instability of each sample. Blue indicates deletions, magenta indicates amplifications. GISTIC2 q-value (FDR) values: XRCC2 (1×10−22), MAD2L2 (1×10−7), RAD51 (1×10−1), ALDH2 (2×10−1). b Mutational frequency of key HNSCC driver genes in HPV-negative sporadic HNSCC samples with MAD2L2, ALDH2, RAD51, or XRCC2 deletions (n = 52) versus entire HPV-negative TCGA-HNSCC cohort (n = 415). GISTIC q-value (FDR) values: CDKN2A (4×10−264), PTPRD (7×10−40), KMT2C (1×10−22), PIK3CA (5×10−57), NSD1 (1×10−4), CSMD1 (9×10−101), LATS2 (2×10−23), MXD4 (5×10−4), CCND1 (8×10−252), FAT1 (9×10−36), SDHB (7×10−7), NOTCH2 (8×10−19), MYC (6×10−22), NOTCH1 (2×10−3), DIP2C (3×10−2), NCOR2(2×10−1), TGIF(4×10−6), PTEN (4×10−11), EGFR (1×10−52). c Number of focal copy-number alterations in sporadic HNSCC tumors (n = 321 samples with data on smoking history), stratified by number of cigarette pack-years associated with each sample. Shown are cases with zero pack years (no recorded smoking), cases with more than one (>1) pack-years, and cases with more than two (>2), more than three (>3), more than four (>4) and more than eight (>8) pack years. Two-tailed Mann-Whitney U test p-values are indicated, with median and IQR shown. d HPV-negative sporadic HNSCC samples (n = 415) ranked by number of focal somatic copy-number alteration (sCNA) peaks as defined by GISTIC2. Annotated are the top and bottom sCNA quartiles, with the top quartile being most unstable and the bottom quartile being most stable. Median and IQR displayed. e Comparison of the number of cigarette pack-years for smokers in top (n = 104) and bottom (n = 104) copy-number quartiles. Two-tailed Mann-Whitney U test p-value is indicated, with median and IQR shown. f Bar chart indicating the proportion (%) of samples within top and bottom sCNA quartiles exhibiting each respective COSMIC signature ID3, ID8, SBS4, or DBS2. Annotated are fold-differences in these proportions. g Comparison of the total number of ID3, ID8, SBS4, and DBS2 signature events between top (n = 104) and bottom (n = 104) sCNA quartiles. Indicated in brackets is the proportion (%) of total SBS, DBS, or ID events represented by the respective signature in each sCNA quartile. In all cases, n refers to independent biological samples.
a In vitro cell growth curve of pre-engraftment Fanca+/+ and Fanca−/− keratinocytes, measured by cell count over six days with three independent experimental replicates per genotype. Data points indicate the mean cell count and bars indicate standard deviation. b Mean replicate tumor volumes measured at multiple time points during the 2nd, 6th, and 11th engraftment cycles of Fanca+/+ and Fanca−/− keratinocytes. Each genotype has 4 independent replicates, each of which in turn is comprised of 4 co-engrafted tumor sites on a single mouse (for a total of 16 tumors per genotype). Each data point represents one replicate as the mean volume of its 4 constituent tumors at the specified time point, with standard error bars indicated. 100x103, 70x103, and 35x103 cells were engrafted at 2nd, 6th, and 11th engraftment respectively. 1st engraftment data is shown in Fig. 4c. Fanca−/− was reduced to 3 replicates at the 6th and 11th cycles due to recurrent loss through host death. c Number of tumor SVs categorized by class: inversion (INV), deletion (DEL), tandem duplications (TD), translocation (TRA) in n = 4 Fanca+/+ and n = 3 Fanca−/− replicates from 6th engraftment cycle. Two-tailed unpaired t-test p-values displayed (inversions: t = 2.934, df = 5), with medians and IQRs indicated. d Proportion (%) of SVs represented by each class in n = 4 Fanca+/+ and n = 3 Fanca−/−replicates at 6th engraftment cycle. Two-tailed unpaired t-test p-values displayed (inversions: t = 2.666, df = 5), with medians and IQRs indicated. e Unsupervised-clustering heatmap displaying differential transcriptomic gene-set enrichment across all replicates at pre-engraftment and 1st, 2nd, 6th, & 11th engraftment cycles for Fanca+/+ and Fanca−/− genotypes (32 samples). Relative gene set enrichment or depletion is indicated by color scale at each time point (ANOVA test). Gene sets displayed have a FDR-adjusted p-value < 10−7. Pre indicates pre-engraftment, E indicates engraftment, R indicates replicate. f RNAseq differential expression heatmap across all replicates displaying time-course expression changes in genes associated with keratinocyte identity, EMT transition, and inflammation/immune cell activation. Heatmap color indicates Z-scaled log2-normalized expression (32 samples). Pre indicates pre-engraftment, E indicates engraftment, R indicates replicate. In all cases, n refers to independent biological samples.
Extended Data Fig. 9 Single cell transcriptomics of case F44P1 and integration with other single-nuclei FA SCC samples.
a Feature plots superimposed on a UMAP embedding displaying cell type identity markers corresponding to the annotated clusters in Fig. 4g. Macrophage (CD163), CD4+ T-cells (CD4), CD8+ T cells (CD8A), KRT14/5+ tumor keratinocytes (KRT14), neutrophils (HCAR2), fibroblasts (COL11A1), mast cells (TPSAB1), Langerhans dendritic cells (CD207), p-EMT tumor keratinocytes (LAMA3), myofibroblasts (ACTA2), differentiated tumor keratinocytes (SPRR2E), endothelial cells (VWF). See methods for additional markers used for identification. b ASCAT plot of WGS sample F44P1 (top), inferred single-cell copy-number analysis displaying distinct amplifications in tumor keratinocyte, p-EMT tumor keratinocyte, and differentiated tumor keratinocyte clusters (bottom) c Feature plot displaying the scTSK sensor score for case F44P1. d Feature plots displaying a selection of scTSK markers. e Feature plot displaying p-EMT sensor score for case F44P1. f Feature plots displaying a selection of p-EMT markers. g Fold-enrichment in gene expression between p-EMT vs non-EMT tumor keratinocytes in F44P1 (DESeq2 log2(x) > 0.2, Wald test with FDR-adjusted p-value < 0.05) shown by GO term. GO enrichment Fisher’s exact test FDR-adjusted p-value displayed. h UMAP embedding displays the integrated clustering of F44P1 (single-cell), F46P1 (single-nuclei), and F38P1 (single-nuclei) samples after quality control (k = 1,986 cells). Cell type identities are indicated in the legend. i scTSK and p-EMT sensor scores of integrated samples, split by constituent tumor sample. Also see Supplementary Fig. 1 for examples of cellular markers used in h and i.
Extended Data Fig. 10 Spatial transcriptomics of FA SCC and fibroblast-tumor keratinocyte interactions.
a left to right: H&E-stained scan of sample F38P1 showing a scale bar, spatial feature plots of CCND1, EGFR, SNAI2, LAMC2, TGFBI expression and imputed G1/G2-M/S cell-cycle stage. b GSEA EMT hallmark enrichment plot, assessed using a pre-ranked gene list determined from differential expression analysis between the p-EMT tumor cluster 6 against the remaining tumor clusters. EMT hallmark enrichment and normalized enrichment scores were 0.64722323 and 2.0873358, respectively, with the nominal p-value = 0 and the adjusted FDR value = 0. c ASCAT plot of the F38P1 WGS sample (top). Inferred single-spot copy-number analysis displaying distinct amplifications in tumor versus normal tissue (bottom). d Location of tumor keratinocytes and adjacent non-tumor stroma (top) used for spatial neighborhood analysis. Differential expression between tumor keratinocyte spots and directly adjacent stromal spots (bottom). e Ligand-receptor interaction analysis between tumor-associated fibroblasts and p-EMT tumor keratinocytes vs. non-EMT tumor keratinocytes in F44P1 single-cell sample.
Supplementary Discussion, Supplementary References and Supplementary Figs. 1–4.
Clinical and pathology data for human samples sequenced in this study.
Alignment of FA SCC and sporadic HNSCC samples against HPV genomes.
Somatic SNV and Indel calls in FA SCCs and sporadic HPV-negative HNSCCs.
CNV frequencies and defined focal amplification and deletion peaks in FA SCC and sporadic HNSCC.
SV Calls in FA SCCs, sporadic HNSCCs, BRCA2mut, and BRCA1mut tumours.
SV Calls from PacBio-sequenced FA SCCs.
Quantification of fold back inversions, fold back inversion-templated insertion chains, retrotransposon element insertions, and select reciprocal inversions (classical inversion) identified in FA SCC and sporadic HNSCC cohort (Illumina WGS).
FA SCC vs. sporadic HNSCC (TCGA) differentially expressed genes.
Methylation array analysis of FA SCCs.
Mouse Tumour SV and RNAseq expression data analysis.
Single-Cell RNAseq cluster differential expression.
Integrated FA SCC and Sporadic SCC Mutation Database including Figure_Generation_Script.R and Webster_Sanders_2022_SCC_Integrated_Mutation_Database.Rdata.
About this article
Cite this article
Webster, A.L.H., Sanders, M.A., Patel, K. et al. Genomic signature of Fanconi anaemia DNA repair pathway deficiency in cancer. Nature 612, 495–502 (2022). https://doi.org/10.1038/s41586-022-05253-4