A minimal Fanconi Anemia complex in early diverging fungi

Fanconi Anemia (FA) pathway resolves DNA interstrand cross links (ICL). The FA pathway was initially recognized in vertebrates, but was later confirmed in other animals and speculated in fungi. FA proteins FANCM, FANCL and FANCJ are present in Saccharomyces cerevisiae but, their mechanism of interaction to resolve ICL is still unclear. Unlike Dikarya, early diverging fungi (EDF) possess more traits shared with animals. We traced the evolutionary history of the FA pathway across Opisthokonta. We scanned complete proteomes for FA-related homologs to establish their taxonomic distribution and analyzed their phylogenetic trees. We checked transcription profiles of FA genes to test if they respond to environmental conditions and their genomic localizations for potential co-localization. We identified fungal homologs of the activation and ID complexes, 5 out of 8 core proteins, all of the endonucleases, and deubiquitination proteins. All fungi lack FANCC, FANCF and FANCG proteins responsible for post-replication repair and chromosome stability in animals. The observed taxonomic distribution can be attributed to a gradual degradation of the FA pathway from EDF to Dikarya. One of the key differences is that EDF have the ID complex recruiting endonucleases to the site of ICL. Moreover, 21 out of 32 identified FA genes are upregulated in response to different growth conditions. Several FA genes are co-localized in fungal genomes which also could facilitate co-expression. Our results indicate that a minimal FA pathway might still be functional in Mucoromycota with a gradual loss of components in Dikarya ancestors.

Table 1.Composition of Fanconi Anemia pathway genes, gene name aliases and protein domains in the protein products of FA genes (subcomplex nomenclature after Niraj and co-workers 7 ).FANCF 13 .The FANCA-FANCG-FAAP20 aids in the process of strand annealing and strand exchange 14 .FANCM, a multi-domain protein, binds to the core complex by the interaction of its MM1 domain with FANCF.Its MM2 domain on the other hand, interacts with the BTR complex and participates in homologous recombination 15 .
The molecular functions for the remaining core proteins remain elusive, apart from binding to DNA.The monoubiquitination of FANCD2 and FANCI, forming the ID complex, is the hallmark of the FA pathway.This ID complex facilitates recruitment of the downstream effector proteins to ICL 8 .The process involves attachment of a single ubiquitin molecule to a specific site in FANCD2 and FANCI 16 and is carried out by FA core complex together with UBE2T.The E3 ligase FANCL acts as a catalyst, with its C-terminal RING domain binding to UBE2T and a central double RWD domain that binds to FANCD2 8 to carry out the ubiquitination.The RWD domain is also said to stimulate the activity of UBE2T 17 .Along with UBE2T, ATR phosphorylates FANCD2 and FANCI via its effector kinase Chk1 to stabilize its association with DNA and FANCD2 18 .A recent study identified a novel ICL sensor protein-E3 ubiquitin ligase UHRF2, which along with its paralogue UHRF1, interacts with FANCD2 after its recruitment in the DNA and facilitates the retention of FANCD2 to the site of ICL 19 .
The downstream effector proteins comprises endonucleases and repair proteins that contribute to ICL repair.Coordination of nuclease activity begins with SLX4 (FANCP) that functions as a scaffold, modulator and a docking platform for SLX1, MUS81-EME1 and XPF-ERCC1 structure-specific nucleases 5,20 .The FAN1 nuclease contains a UBZ domain in its N-terminus that acts as a platform for binding to FANCD2.The monoubiquitinated ID complex recruits this endonuclease complex to the site of ICL that carries out incisions to unhook the crosslinked bases 16 .Studies identified that XPF-ERCC1 is the most important for resistance to ICL 5 .This process of unhooking is necessary for initiation of TLS.The FANCJ helicase binds to FANCD2 and regulates chromatin localization 21 .
The TLS polymerase consisting of REV1, REV3-REV7 or Pol ζ complex and DPOLN (DNA polymerase ν) is subsequently recruited to carry out lesion bypass.While REV1 inserts the first nucleotide opposite a damaged base, REV3 performs the extension step.The polymerase REV3 is particularly specialized in extending distorted base pairs (such as mismatches due to inaccurate base insertion by another TLS polymerase).REV7, a newly identified FA gene, acts as an adaptor between REV1 and REV3 22 , and together they perform lesion bypass and are considered the main players of post-replication repair 23 .DPOLN (DNA polymerase ν), a Y-family polymerase protein, is able to carry out insertions as well extensions in a diverse set of minor and major groove ICL with no stalling of replication forks 24 .In the case of TLS with A-family polymerase proteins (such as polη), a monoubiquitinated PCNA is required for recruiting polη to the site of break and ensure accurate replicative bypass TLS activation 25 .TLS involving DPOLN is controlled by the FA core complex and is independent of the PCNA ubiquitination event in mammals 26 .Proteins FANCO, FANCU and RAD51 are components of the RAD51 paralog complex BCDX2 27 .The BCDX2 complex contains RAD51B, RAD51C, RAD51D, and XRCC2; it binds to single-stranded DNA, nicks in duplex DNA and to single-stranded regions in duplex DNA 28 .FANCO forms RAD51 foci in response to the damage 29 .It facilitates phosphorylation of the checkpoint kinase CHEK2 and thereby transduction of the damage signal, leading to cell cycle arrest and HR activation 26 .XRCC2, classified as an FA gene (FANCU), stabilizes RAD51 27 .The FA components, BRCA1, BRCA2 and PALB2 take part in the subsequent steps of DNA repair involving HR 5 .BRCA1 functions upstream of BRCA2, and recruits PALB2 30 .BRCA2 helps load RAD51 to the DNA by interaction of its BRC4 repeat with the RecA domain of RAD51 31 .PALB2 binds directly to both BRCA1 and BRCA2, and together they remove the CMG helicase from stalled replication forks 7 .
The end of the FA pathway is marked with the deubiquitination of FANCD2/ FANCI heterodimer by the USP1-UAF1 complex 32 .It is carried out by the interaction of deubiquitylating enzymes USP1 and UAF1 with FANCD2 in the chromatin.The ID complex is deubiquitylated by USP1 whose activity is enhanced by the interaction with UAF1 33 .
ICL repair by FA pathway is best studied in humans 2 and mice 35 (Fig. 1).However, studies show that FA core binding proteins are present in all classes of sponges, pointing to the ancient origin of the FA pathway in animal evolution 36 .The nematode Caenorhabditis elegans has a functional ortholog of the DEAD-box helicase FANCJ, termed as DOG-1 (Deletions of G-rich DNA) that functions alongside FANCD2, FANCM, FANCO, FANCI proteins and maintains the stability of its G-rich DNA 37 .The function of FA proteins was also recently studied in Drosophila melanogaster, where the monoubiquitination of FANCD2 was found to be linked to a mitosis-specific DNA DSB response 38 .Proteins MHF1 and MHF2, were previously identified as anti-crossover factors during meiosis in Arabidopsis thaliana 13 .Singh and colleagues identified FANCC as another anti-crossover gene and showed that FANCC, FANCE and FANCF subcomplex was conserved from vertebrates to plants and it regulates meiotic recombination 13 .There have also been attempts to study the FA pathway in non-animal organisms.For instance, studies in Saccharomyces cerevisiae report the presence of putative homologs of FA activation proteins MHF1 and MHF2 (CENPS and CENPX respectively) as well as FA core binding proteins FANCM, FANCJ and FANCP (MPH1, CHL1 and SLX4 respectively) 4,39 .However, yeast cells deficient in aforementioned proteincoding genes displayed no significant sensitivity to ICL.Instead, they require a combination of different DNA repair systems: NER, HR and post-replication repair to mitigate ICL 4 .
In light of these studies, there is substantial evidence of the FA pathway beyond mammals, with some missing proteins.The discoveries in amoebozoa, non-mammalian animals and fungi led us to ask questions about the existence of FA pathway in ancestral Opisthokonts and early diverging fungal (EDF) lineages.Here, we conduct a genomic survey of FA proteins across the fungal tree of life.We propose a hypothetical model of FA pathway for distinct evolutionary lineages and recapitulate the evolutionary history of FA complexes across taxa.

Distribution of proteins
Genes encoding proteins from the FA pathway are conserved in evolution because of their involvement in cellular homeostasis.To recover the distribution of individual components of FA pathway, we mapped 40 reference proteins on a collection of 183 fungal and five early diverging Opisthokonta proteomes (protein sets derived from whole genome sequencing projects) (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1).Out of the aforementioned 40 FA selected reference sequences, 32 have homologs in fungal proteomes (Fig. 2, Supplementary Results).

Transcriptomics of FA proteins
FA proteins expressed in pure culture transcriptomic studies of EDF Since most of the identified homologs of FA components are annotated as hypothetical unknown proteins without experimental characterization, we looked for their gene expression levels in the available EDF whole transcriptomes.The number of transcripts found serves as an intermediate evidence that the predicted FA homologs in fungi originate from active genes in normal conditions.The analyzed transcriptomes corroborate the expression of 27 out of 32 identified FA homologs in representatives of Mucorales (M.lusitanicus, J. flammicorona), Endogonales (Endogone sp.), Umbelopsidales (U. isabellina) and Mortierellomycotina (L.transversale).

FA proteins expressed in condition-specific transcriptomic studies of EDF
In order to test the functionality of the FA candidate genes, if they are expressed and regulated depending on environmental conditions, we analyzed expression profiles of genes corresponding to the proteins with available transcriptomic data obtained from various environmental conditions.A total of 26 out of 32 genes encoding homologs of FA pathway components were expressed under multiple conditions.This result suggests that the genes are actively regulated and may be involved in stress responses.
FANCW and downregulation of MHF1 and MHF2, UAF1 and FANCV (Fig. 3).On the other hand, a similar study conducted on R. microsporus (M2) displayed upregulation of only UBE2T.Transcriptomic studies on R. delemar were also found centering human host-pathogen interaction against mucormycosis.In one of the studies, A549 airway epithelial cells were subjected to in vitro infection with R. delemar for 6 h (M4) and 16 h (M5) 42 .The MHF1 gene was found to be upregulated at both timepoints.In another study, mouse bone-marrow derived macrophages (BMDMs) were infected with R. delemar for 1 h, 4 h and 18 h 43 , and we found the expression of FA genes from the samples infected for 18 h (M6).The genes MHF1, FAN1, REV3, RAD51, UAF1 and FANCV were upregulated while FANCJ, FANCL, UBE2T and FANCW were downregulated (Fig. 3).
The study involving G. rosea treated with G24 (G1), a synthetic analog of strigolactone, addressed its gene regulation in response to plant signals in the switch from asymbiotic growth to presymbiotic growth 44 .Our data analysis displayed upregulation of MHF1, UBE2T, UHRF1, FANCD2, FANCI, EME1, REV3 and FANCV respectively with multiple copies of UBE2T.SLX1 and FANCU were downregulated (Fig. 3).
Two different transcriptomic studies were obtained for R. irregularis.One study concerned the transcriptomic profiling of AM roots exposed to varying concentrations of phosphate (20 μM, 100 μM, 300 μM, 500 μM) (G2) 45 , where multiple copies of UBE2T were downregulated.Another study focused on the differential expression of R. irregularis during AM symbiosis by harvesting strigolactone-treated spores a day (G3) and a week (G4) after inoculation respectively 46 .Our analysis revealed that spores harvested a day post inoculation had upregulation of FANCO, UHRF1, UHRF2, SLX1,EME1 and RAD51, and downregulation of UBE2T, while on the other hand, UHRF1 and RAD51 were upregulated in the spores harvested a week post inoculation (Fig. 3).
We also checked the expression profiling of genes involved in the NER, MMR and DSB repair pathways in order to find their correlation with the down-or upregulation of FA genes.All of the NER, MMR and DSB genes were upregulated in R. delemar, M. lusitanicus and R. microsporus, regardless of the environmental conditions in contrast to the results found for FA genes.However, in the case of G. rosea and R. irregularis, genes involved in DSB repair were particularly downregulated, while those involved in NER and MMR were upregulated.

FA genes not expressed in transcriptomic analysis
Generally, at least one gene from every FA pathway subcomplex was detected in the transcriptomes of pure culture and treated fungal species.The transcriptomes obtained from axenic cultures of Mucoro-and Glomeromycotina members did not detect the genes encoding endonucleases: EME1, SLX4 and XPF (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1).While on the other hand, transcriptomes obtained from organisms subjected to condition-specific expression did not detect genes encoding ATR, XPF, ERCC1 and SLX4, REV1, and UBP1 (not shown in Fig. 3).

Genomic co-occurrence of FA genes
Fungal genes involved in one metabolic process are often clustered in fungi 47 .We found that several genes involved in FA pathway are located on the same contig.Usually, only two or three FA genes were colocalized, however, we found up to ten genes co-occurring on a single contig in selected Ascomycota (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1).Colocalization with another FA gene within a distance of 250 kb was observed for 18% of Dikarya FA genes and for 19% of EDF genes encoding FA proteins.The distance of 250 kb was chosen as a threshold based on the occurrence of biologically important interactions (such as contacts between enhancers and promoters) in this range 48 .A hypergeometric distribution test performed on 56 genome datasets showed that the observed co-occurrences of FA genes are not expected at random (with the highest p value = 0.009) (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1).Co-occurrence of proteins in distances greater than 250 kb were classified into long-range colocalization patterns 49 .FA genes were not localized on the same contig with any other FA gene in a minority of taxa (7/36 Dikarya and 30/116 EDF) (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1).Not all FA genes are equally likely to cooccur with others; we did not observe any proximity to other FA genes for FANCA, FANCE, UHRF1, UHRF2, BRCA1, REV1 and UAF1 within 250 kb distance range (Fig. 4A).FA genes in Chytrids and Blastocladiomycota co-occurred in groups of four, the distances among each gene ranging between 200 and 300 kb (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1).
In Dikarya, we mainly observed long-range colocalization between FA genes.However, genes UHRF1, UHRF2, DPOLN, REV1 and UAF1 did not co-occur even in large distances for all fungal groups (Fig. 4B).
Genes coding proteins which directly interact, for instance FANCD2 and FANCI, were not located together in any of the analyzed genomes.However, these genes tend to co-occur with a ubiquitination, an endonuclease or a deubiquitination gene across all EDF.
Endonucleases are the products of genes most often co-occurring with other FA components, particularly the genes for REV3 and UBE2T or ATR are located close to each other (Fig. 4).

Differences in domain architecture
For 30 out of 32 predicted FA proteins, fungal sequences clustered together with the animal reference sequences.However, SLX4 and FANCJ sequences formed separate clusters.Further analysis of these proteins revealed that SLX4 and FANCJ homologs display a few differences in their lengths and domain composition.This points to a possible change in protein function or specificity in fungi.
We found that SLX4 harbors an N-terminal SAP domain and a C-terminal SLX4 domain across diverse fungal lineages (Fig. 5).On the other hand, the human SLX4 is a 1834 amino acid long protein which, apart from the two domains, contains three additional domains in its N-terminus: UBZ, MLR and BTB/POZ.The BTB/POZ domain is present in SLX4 homologs of animals including sponges, but absent from all fungal homologs.The UBZ and MLR domains are also absent in fungi.Moreover, the fungal homologs are significantly shorter than their animal homologs with lengths ranging from 151 aa in Allomyces macrogynus

Phylogenetics of Fanconi Anemia proteins
Phylogenetic trees for 32 FA proteins are in agreement with the species tree, pertaining to the vertical inheritance of housekeeping genes in eukaryotes (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Materials SM1, Supplementary Dataset SD1).Most FA proteins occur as a single copy per proteome.A conserved domain architecture was observed in all of the analyzed Opisthokonts, with exceptions in FANCJ helicase and SLX4 scaffold protein.In addition to the differences in protein length as well as domain architecture, gene duplications were observed in members of Blastocladiomycota, Entomophtoromycotina and Mucoromycotina respectively.For instance, in Allomyces macrogynus, 2 copies were observed for proteins FANCL, FANCM, FANCO, ATR, UHRF1, UHRF2, SLX4, FAN1, whereas proteins FANCJ, and ERCC1 had four copies each.A series of duplications were observed in 6 FA proteins in Entomophthora muscae, namely FANCD2, FANCL, UBE2T, SLX1, ERCC1 and FAN1.The MUS81 endonuclease was found in 10 copies in J. flammicorona.Duplications were also observed in EDF species containing the ID proteins (FANCD2-FANCI), FANCJ helicase and endonucleases MUS81, SLX1 and FAN1.
Taking into account all the homologs of FA components found in EDF, a model of minimal FA machinery could be proposed (depicted as an example in Glomeromycotina and Chytridiomycota) (Fig. 6).The presence of FANCM-MHF1-MHF2 activation complex could signal the existing core proteins FANCL and FANCJ to bind to the ICL site, bringing together the ubiquitination proteins.This process would be followed by monoubiquitination of FANCD2 and FANCI forming the ID complex, which along with FANCJ, would enable the build-up of endonucleases around the site of ICL.The subsequent unhooking of ICL would be followed by TLS and deubiquitination of the ID complex with USP1-UAF1 complex.The final step would be complemented with the HR proteins along with proteins from NER and DSB repair pathways.The absence of ID proteins in Dikarya might enable Pso2 nuclease and Hrq1 helicase to associate with FANCJ helicase and endonuclease components of FA in order to complete the unhooking of ICL.

Discussion
In this study we show that the FA pathway is fragmentarily present in basal Opisthokonts like S. rosetta, C. owczarzaki, M. brevicollis and S. arctica, as well as in many fungal lineages, especially EDF.Recently, diverse traits were reported to exist in EDF long after their separation from the common ancestor with animals 52 .Now, the FA pathway extends the list of known ancient pathways retained in fungi.
The work of Tao and coworkers revealed that MHF 1/2 functioning requires a stable association with FANCM 12 .FANCM homolog in Archaea (Hef protein) resolves replication forks and possesses both helicase and www.nature.com/scientificreports/endonuclease activities 53 .The FANCM ortholog of Schizosaccharomyces pombe (Fml1), also has FA-independent roles in DNA damage response.FANCM-MHF worked in parallel with EME1-MUS81 to process meiotic joint DNA molecules limiting crossovers 54 .On the other hand, the FANCM ortholog in A. thaliana was found to have no direct role in ICL DNA repair, but necessary in HR pathways in somatic cells 55 .FANCM also promotes RAD51-dependent gene conversion at stalled replication forks 56 .The absence of FANCM in Glomeromycotina hence, puts an open question on their mechanism of FA activation and subsequent DNA repair process.The endonuclease complex is the most conserved across all lineages; this is consistent with their involvement in universal repair pathways beyond FA.FA-specific proteins forming the core binding complex have a patchy taxonomic distribution with the whole set limited to mammals.The currently known ensemble of core binding proteins likely originated in the ancestor of vertebrates.The work of Alpi et al. (2008) demonstrated that FANCL is the crucial E3 ligase subunit and that UBE2T, FANCL and FANCI were sufficient for robust FANCD2 monoubiquitination 17 .They also speculated that flies, worms and Dictyostelium may possess a simplified FA monoubiquitination pathway with just three components: UBE2T, FANCL, and FANCI.The conservation of FANCD2, FANCI, UBE2T and FANCL in EDF (Fig. 2), along with their expression during stress conditions (Fig. 3) provides evidence of a simplified, functioning core complex.Ubiquitination of both FANCD2 and FANCI forms a dual ubiquitin-locking mechanism needed for the ID complex function 57 .The ID complex is retained in EDF but lost in Dikarya (Fig. 2).Its absence in Dikarya might point to the lack of sensitivity to ICL in yeasts.It is also possible that other DNA repair systems are involved in ICL repair in this group of organisms.
A significant part of the FA pathway is the process of unhooking, covered mostly by the SLX1-SLX4 complex.Studies of SLX4 in humans have pointed out that loss of the UBZ domain (which recruits SLX4 to the site of ICL) in SLX4 would abolish FA-independent interactions, increasing the number of chromosomal aberrations 58 .The absence of the UBZ domain in members of the fungal kingdom leaves a gap in knowledge of the docking sites for SLX4 (Fig. 5).Interestingly, SLX4 genes were expressed neither in normal nor in stress conditions.
The transcriptomic analysis carried out in this study showed that SLX1 is highly upregulated in Rhizophagus irregularis associated with Medicago truncatula treated with strigolactone for 24 h.This result suggests that SLX1 would partner with another inactive nuclease for substrate binding instead of SLX4 or that SLX4 would function differently in EDF than in mammals.The upregulation of FA core and repair genes during exposure of R. delemar and R. microsporus spores to phagocytosis (murine macrophages) leads to active catabolic pathways, nucleic acid binding, transcription and regulation via polymerase, indicating the involvement of genome maintenance pathways 41 .The association of AM fungi with plant roots triggers plant host defense mechanisms.We recognized downregulation of FA genes in R. irregularis in response to its www.nature.com/scientificreports/colonization to Lotus japonicus roots at different phosphate concentrations of 20, 100, 300, and 500 μM (Fig. 3).It indicates that higher phosphate concentrations silence the FA gene expression, coupled with decrease in AM fungal colonization in Glomeromycotina.
Yeasts lack most of the FA pathway components and deal with the ICL threat by a replication and recombination-independent mechanism.In such a case, endonucleolytic unhooking depends on the Pso2 nuclease facilitated by Hrq1 helicase 51 , which is followed by translesion synthesis.This pathway may be responsible for the removal of ICL in FA pathway deficient cells 59 .We found Pso2 homologs in all classes of fungi, except Microsporidia.This opens up the possibility of an alternative for FA pathway in most of the fungal taxa.The presence of Pso2 is the basis of our model for ICL repair in Saccharomycotina and could, in general, be extended to Dikarya (Fig. 6).
FANCJ consists of N-terminal DEAH/DEAD box helicase and C-terminal Rad3 helicase domains.DEAD box helicase domain is missing from some of the FANCJ homologues in EDF.This might be a valid domain truncation or gene calling error missing the first exon in some of the taxa.Studies report that the DEAH/DEAD box domain binds to helix extensions, helps in target recognition and unwinding 60 .Lack of DEAD box helicase may reduce the catalytic activity of FANCJ in some of the EDF.However, it is also possible that Rad3 helicase domain alone can perform the necessary function.
The repair of ICL DNA damage undergoes a final step involving the vertebrate-specific DNA polymerase ν (DPOLN).DPOLN depletion results in low HR efficiency and an increased sensitivity to ICL causing agents 23 .With the occurrence of DPOLN in Chytridiomycota, we can speculate that this protein is of ancestral origin.A study by Huang and Cook points out that the rate of HR mediated DNA repair is not the same across fungal species 61 .It is likely that the uneven distribution of HR proteins across the fungal tree explains their observation.
The duplications observed in the phylogenetic trees of FA proteins are likely attributed to whole genome duplications or polyploidy in a given taxon (like in the case of A. macrogynus), a bigger genome size (240 Mb genome size in the case of J. flammicorona) and occurrence of genomes full of repetitive elements and potential functional diploidy (in case of E. muscae) 62 .
We observed previously unreported FA components to be conserved beyond fungi and mammals.D. discoideum is remarkably resistant to DNA damage 63 and was previously reported to have a minimal FA pathway consisting of FANCD2, FANCI, FANCL, FANCM, FANCJ and UBE2T, components which possibly evolved in the last eukaryotic common ancestor 64 .We additionally found other FA members: MHF1, FANCO, ATR, FAN1, MUS81, XPF, ERCC1, REV1, REV3, USP1 and UAF1 (Fig. 2) in the social amoeba proteome (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1), which supports the hypothesis of the early evolutionary origins of FA pathway in general.Another experimental study on D. discoideum proved that excision repair nuclease XPF was necessary to repair ICL 64 .Also, early animals including sponges possess a relatively complete FA pathway (Fig. 2).
Interestingly, in Arthropods, we observed a rather depleted FA protein repertoire.The presence of FA core binding proteins: FANCJ, FANCL, FANCM, with conserved functions of FANCD2 and FANCI in D. melanogaster points to a reduction of FA pathway in arthropods 38 .It was also found that in flies, each nuclease can act individually, without the need to form a complex during ICL repair 65 .Based on our results in nematodes, we may speculate there is a functional equivalent of the FA pathway for ICL repair due to the conservation of activation proteins, ID complex proteins, along with FANCJ, FANCM and FANCO, all of which are essential for pathway progression and ICL repair.
The co-occurrence of FA pathway genes in fungal genomes suggests their involvement in a common network.The genomic proximity in fungi is often linked with co-expression of genes which are needed by the cell to work in one process (Fig. 4).This non-random organization of FA genes is another confirmation of a possible functionally active FA repair system for ICL in EDF.
The localization of ID complex genes with ubiquitins, endonucleases and DNA repair genes stands true with the chronology of events (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1).This is particularly visible in Mucorales and Mortierellales.
Despite the divergence of Dikarya around 650 million years ago 66 , the genomic proximity of FA components is preserved, particularly in Ascomycota (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1).The co-localization of FA genes across the fToL implies an evolutionary pressure exerted on these genes to cluster together and maintain a functional pathway, perhaps for genome maintenance.
The existence of two FA pathway components fused in a single protein in most members of Chytridiomycetes (https:// doi.org/ 10. 5281/ zenodo.10911 400, Supplementary Table S1), supports the existence of a gene fusion event around Chytrid evolution.However, the proteins are atypically long (1800-1900 aa) and there is only one transcript in EST database supporting such a gene model (XM_016756588.1:Spizellomyces punctatus DAOM BR117) with the presence of both protein domains in a single protein.This does not rule out the possibility of gene fusion since transcriptomic data for Chrytidiomycota is scarce.Regardless of the robustness of gene fusion, there is genetic proximity between the FANCM and XPF endonuclease in this taxon.
The lack of genomic proximity of FA core genes in Glomeromycotina can be attributed to high activity and proliferation of transposons in their genomes leading to genome reshuffling.The high interspecific diversity in the genomes of Glomeromycotina affects all known protein domains 67 .
Observed differences in FA conservation among Opisthokonta were summarized in the proposed model of FA repair pathway in Glomero-and Chytridiomycota groups (Fig. 6).We speculate the presence of a minimal FA pathway in early diverging fungi that promotes ICL repair.The expression of genes encoding proteins from NER, DSB and MMR pathways co-occurs with the expression of FA components, opening a possibility of coordination of these pathways to maintain genome stability in EDF.The presence of FANCE and FANCL in EDF, proteins that play a crucial role in mediating the monoubiquitination and formation of the ID complex, could be sufficient to kick-start an active FA pathway.The absence of the above proteins coupled with the absence of an ID complex in Dikarya could be the answer to insensitivity of ICL reported in yeasts, despite the discovery of FA homologs 4 .Numerous animal models for FA pathway have been developed and yet, no model has come up regarding the dynamics of FA pathway proteins in the fungal kingdom.Identification of putative homologs of FA proteins in EDF groups could pave the way for understanding the biology of ICL repair in the fungal tree of life.
Taken together, our study points to the ancient origin of the FA pathway and its conservation beyond mammals.At the same time, the pathway was shaped by massive gene loss in model animals and Dikarya.We hypothesize the existence of a minimal form of Fanconi Anemia pathway in the early diverging fungal lineages.
The differential expression analysis was carried out using the DESeq2 R package 89 for protein-coding gene expression from condition-specific transcriptomic datasets.The Padj value was set to ≤ 0.05, and RNA-Seq reads were mapped on the FA protein-coding gene sequences.The log2fold change criteria [downregulation < 0 > upregulation] was used to determine the gene expression profiles.In addition to this, using the same approach, we also looked at the expression profiling of genes involved in DNA repair pathways of NER, DSB and mismatch repair (MMR).
To determine genomic localization of FA protein coding genes, contig names and coordinates were retrieved from NCBI using edirect tools.The genes occurring in a single contig were grouped together and their corresponding distances were calculated.Genomic distances up to 250 kb between locus pairs were grouped in colocalization pattern 48 , while distances greater than 250 kb were grouped under long-range colocalization patterns 49 .We tested the possibility of randomly finding two FA genes in a window of 250 kb by taking into account: average gene length, gene distance and gene number in a given genome.We applied a hypergeometric distribution test using values derived from GFF files.We were able to download 56/183 GFF files.The hypergeometric probabilities were calculated for each of the 56 GFF files and the standard P-value (P < 0.01) was chosen as the filtering criterion.The frequency of co-localization was computed and the results were visualized using Cytoscape v3.10.1 90 .

Data availability
All metadata processed in this study are deposited in zenodo: https:// doi.org/ 10. 5281/ zenodo.10911 400.All protein identifiers, genomic assemblies, transcriptomic datasets, hypergeometric test values are listed in Supplementary Table S1.The figures of phylogenetic trees for FA proteins are depicted in Supplementary Materials SM1 and available as a newick file format in Supplementary Dataset SD1.Supplementary Results provide a detailed commentary on the taxonomic distribution depicted in Fig. 2.

Figure 1 .
Figure 1.Schematic representation of the Fanconi Anemia pathway in humans, derived from 34 .[I-Activation of FA pathway; II-Binding of FA core complex to the ICL site; III-Monoubiquitination of FANCD2 & FANCI to form the ID complex; IV-Formation of endonuclease complex and subsequent ICL unhooking; V-Translesion synthesis or lesion bypass; VI-Deubiquitination of ID complex and closure of FA pathway.]

Figure 2 .
Figure 2. Distribution of 40 FA and FA associated components among selected eukaryotes.
(KNE55791.1,Blastocladiomycota) up to 423 aa in Rhizophagus irregularis (POG76120.1,Glomeromycotina).Mortierellomycotina SLX4 homologs are as long as the ones in animals (for instance, SLX4 of Haplosporangium bisporale KAF8951561.1, is 1993 aa long) and they also contain an additional N-terminal S2P-M50 domain.Differences were also observed in the domain architecture of FANCJ.The FANCJ is a 5′-3′ DNA binding helicase with conserved N-terminal DEAH/DEAD box helicase and C-terminal RAD3 helicase domains.Multiple sequence alignment of FANCJ homologues in EDF points towards a deletion of the DEAD box domain.FANCJ homologs in Olpidiomycota and Microsporidia are significantly truncated with a solo DEAD/DEAH box domain without the C-terminal RAD3 helicase.These deviations from the core FANCJ domain architecture might impair its function.

Figure 4 .
Figure 4. Co-occurrence of FA genes within distances of (A) 250 kb and (B) greater than 250 kb observed in Dikarya and EDF assemblies.

Figure 5 .
Figure 5.An unrooted Maximum Likelihood phylogenetic tree of SLX4 scaffold protein across selected eukaryotic lineages, drawn in iTOL 50 .

Figure 6 .
Figure 6.Speculative FA pathway model for Glomeromycotina and Chytridiomycota compared to Saccharomyces cerevisiae51 and Homo sapiens7 .The filled shapes indicate presence of the protein in all members of the group; the shapes outlined indicate presence of the protein in 50% of the members of that group.